Guys, I used Random Forest with a couple of data sets I had to predict for binary response. In all the cases, the AUC of the training set is coming to be 1. Is this always the case with random forests? Can someone please clarify this?
I have given a simple example, first using logistic regression and then using random forests to explain the problem. AUC of the random forest is coming out to be 1. data(iris) iris <- iris[(iris$Species != "setosa"),] iris$Species <- factor(iris$Species) fit <- glm(Species~.,iris,family=binomial) train.predict <- predict(fit,newdata = iris,type="response") library(ROCR) plot(performance(prediction(train.predict,iris$Species),"tpr","fpr"),col = "red") auc1 <- performance(prediction(train.predict,iris$Species),"auc")@y.values[[1]] legend("bottomright",legend=c(paste("Logistic Regression (AUC=",formatC(auc1,digits=4,format="f"),")",sep="")), col=c("red"), lty=1) library(randomForest) fit <- randomForest(Species ~ ., data=iris, ntree=50) train.predict <- predict(fit,iris,type="prob")[,2] plot(performance(prediction(train.predict,iris$Species),"tpr","fpr"),col = "red") auc1 <- performance(prediction(train.predict,iris$Species),"auc")@y.values[[1]] legend("bottomright",legend=c(paste("Random Forests (AUC=",formatC(auc1,digits=4,format="f"),")",sep="")), col=c("red"), lty=1) Thank you. Regards, Ravishankar R -- View this message in context: http://r.789695.n4.nabble.com/Random-Forest-AUC-tp3006649p3006649.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.