Hi Matthew, The error rate reported by randomForest is the prediction error based on out-of-bag OOB data. Therefore, it is different from prediction error on the original data since each tree was built using bootstrap samples (about 70% of the original data), and the error rate of OOB is likely higher than the prediction error of the original data as you observed.
Weidong On Sat, Nov 26, 2011 at 3:02 PM, Matthew Francis <mattjamesfran...@gmail.com> wrote: > I've been using the R package randomForest but there is an aspect I > cannot work out the meaning of. After calling the randomForest > function, the returned object contains an element called prediction, > which is the prediction obtained using all the trees (at least that's > my understanding). I've checked that this prediction set has the error > rate as reported by err.rate. > > However, if I send the training data back into the the > predict.randomForest function I find I get a different result to the > stored set of predictions. This is true for both classification and > regression. I find the predictions obtained this way also have a much > lower error rate and perform very well (suspiciously well...) on > measures such as AUC. > > My understanding is that the two predictions above should be the same. > Since they are not, I must be not understanding something properly. > Any ideas what's going on? > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.