Hi, I am using randomForest package to do some prediction job on GWAS data. I firstly split the data into training and testing set (70% vs 30%), then using training set to grow the trees (ntree=100000). It looks that the OOB error in training set is good (<10%). However, it is not very good for the test set with a AUC only about 50%. Although some people said no cross-validation was necessary for RF, I still felt unsafe and thought a testing set is important. I felt really frustrated with the results.
Anyone has some suggestions? Thanks. PS: example code I used RF<-randomForest(PHENOTYPE~.,data=Train,importance=T,ntree=20000,do.trace=5000) rownames(Test)<-Test$IID Pred<-predict(RF,Test,type="prob") -- View this message in context: http://r.789695.n4.nabble.com/Random-Forest-Cross-Validation-tp3314777p3314777.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.