See inline.

On Sat, Jan 12, 2019 at 9:56 AM Witold E Wolski <wewol...@gmail.com> wrote:

> ypred_oob <- predict(diachp.rf)

AFAIK these are, indeed, the out-of-bag predictions.

> dataX <- data %>% select(-quality) # remove response.
> ypred <- predict( diachp.rf, dataX )

These are not out of bag predictions. dataX is interpreted as new data
(argument newdata), and it is assumed to contain entirely new
observations. Each observation in dataX is fed through all of the
trees and the predictions are then pooled. There is no out-of-bag here
- all of the new data observations are assumed to be independent of
the training set.

>
> What I find even more disturbing is that 100% accuracy for ypred.
> Would you agree that this is rather unexpected?

It is expected (and not disturbing) l if your training set had enough
variables (or signal) to create trees that fit the training data
perfectly.

HTH,

Peter

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to