See inline. On Sat, Jan 12, 2019 at 9:56 AM Witold E Wolski <wewol...@gmail.com> wrote:
> ypred_oob <- predict(diachp.rf) AFAIK these are, indeed, the out-of-bag predictions. > dataX <- data %>% select(-quality) # remove response. > ypred <- predict( diachp.rf, dataX ) These are not out of bag predictions. dataX is interpreted as new data (argument newdata), and it is assumed to contain entirely new observations. Each observation in dataX is fed through all of the trees and the predictions are then pooled. There is no out-of-bag here - all of the new data observations are assumed to be independent of the training set. > > What I find even more disturbing is that 100% accuracy for ypred. > Would you agree that this is rather unexpected? It is expected (and not disturbing) l if your training set had enough variables (or signal) to create trees that fit the training data perfectly. HTH, Peter ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.