I think the only thing you are doing wrong is not setting the random seed (set.seed()) so your results are not reproducible. Depending on the random sample used to select the training and test sets, you get slightly varying accuracy for both, sometimes one is better and sometimes the other.
HTH, Peter On Sat, Apr 10, 2021 at 8:49 PM <thebudge...@gmail.com> wrote: > > Hi ML, > > For random forest, I thought that the out-of-bag performance should be > the same (or at least very similar) to the performance calculated on a > separated test set. > > But this does not seem to be the case. > > In the following code, the accuracy computed on out-of-bag sample is > 77.81%, while the one computed on a separated test set is 81%. > > Can you please check what I am doing wrong? > > Thanks in advance and best regards. > > library(randomForest) > library(ISLR) > > Carseats$High <- ifelse(Carseats$Sales<=8,"No","Yes") > Carseats$High <- as.factor(Carseats$High) > > train = sample(1:nrow(Carseats), 200) > > rf = randomForest(High~.-Sales, > data=Carseats, > subset=train, > mtry=6, > importance=T) > > acc <- (rf$confusion[1,1] + rf$confusion[2,2]) / sum(rf$confusion) > print(paste0("Accuracy OOB: ", round(acc*100,2), "%")) > > yhat <- predict(rf, newdata=Carseats[-train,]) > y <- Carseats[-train,]$High > conftest <- table(y, yhat) > acctest <- (conftest[1,1] + conftest[2,2]) / sum(conftest) > print(paste0("Accuracy test set: ", round(acctest*100,2), "%")) > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.