Modification of the random forest algorithm to avoid statistical dependence problems when classifying remote sensing imagery

Cánovas García, Fulgencio; Alonso Sarría, Francisco; Gomariz Castillo, Francisco; Oñate Valdivieso, Fernando

doi:https://doi.org/10.1016/j.cageo.2017.02.012

Ficheros

Manuscrito aceptado, no editado por la revista (7.979Mb)

Identificadores

URI: http://hdl.handle.net/10835/15405
ISSN: 0098-3004
DOI: https://doi.org/10.1016/j.cageo.2017.02.012

Servicios

Fecha

2017-02-20

Resumen

Random forest is a classification technique widely used in remote sensing. One of its advantages is that it produces an estimation of classification accuracy based on the so called out-of-bag cross-validation method. It is usually assumed that such estimation is not biased and may be used instead of validation based on an external data-set or a cross-validation external to the algorithm. In this paper we show that this is not necessarily the case when classifying remote sensing imagery using training areas with several pixels or objects. According to our results, out-of-bag cross-validation clearly overestimates accuracy, both overall and per class. The reason is that, in a training patch, pixels or objects are not independent (from a statistical point of view) of each other; however, they are split by bootstrapping into in-bag and out-of-bag as if they were really independent. We believe that putting whole patch, rather than pixels/objects, in one or the other set would produce a les...

Palabra/s clave

Classification

random forest

object-based image analysis

bagging

statistical independence

Colecciones

Artículos de revistas Dpto. Geografía, Historia y Humanidades [114]

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional