Hi,
I am using randomForest package to do some prediction job on GWAS data. I
firstly split the data into training and testing set (70% vs 30%), then
using training set to grow the trees (ntree=100000). It looks that the OOB
error in training set is good (<10%). However, it is not very good for the
test set with a AUC only about 50%. 
Although some people said no cross-validation was necessary for RF, I still
felt unsafe and thought a testing set is important. I felt really frustrated
with the results.


Anyone has some suggestions?

Thanks.

PS: example code I used

RF<-randomForest(PHENOTYPE~.,data=Train,importance=T,ntree=20000,do.trace=5000)
rownames(Test)<-Test$IID

Pred<-predict(RF,Test,type="prob")


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Random-Forest-Cross-Validation-tp3314777p3314777.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to