Guys,

I used Random Forest with a couple of data sets I had to predict for binary
response. In all the cases, the AUC of the training set is coming to be 1.
Is this always the case with random forests? Can someone please clarify
this? 

I have given a simple example, first using logistic regression and then
using random forests to explain the problem. AUC of the random forest is
coming out to be 1.

data(iris)
iris <- iris[(iris$Species != "setosa"),]
iris$Species <- factor(iris$Species)
fit <- glm(Species~.,iris,family=binomial)
train.predict <- predict(fit,newdata = iris,type="response")          
library(ROCR)
plot(performance(prediction(train.predict,iris$Species),"tpr","fpr"),col =
"red")
auc1 <-
performance(prediction(train.predict,iris$Species),"auc")@y.values[[1]]
legend("bottomright",legend=c(paste("Logistic Regression
(AUC=",formatC(auc1,digits=4,format="f"),")",sep="")),  
                col=c("red"), lty=1)


library(randomForest)
fit <- randomForest(Species ~ ., data=iris, ntree=50)
train.predict <- predict(fit,iris,type="prob")[,2]          
plot(performance(prediction(train.predict,iris$Species),"tpr","fpr"),col =
"red")
auc1 <-
performance(prediction(train.predict,iris$Species),"auc")@y.values[[1]]
legend("bottomright",legend=c(paste("Random Forests
(AUC=",formatC(auc1,digits=4,format="f"),")",sep="")),  
                col=c("red"), lty=1)

Thank you.

Regards,
Ravishankar R
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Random-Forest-AUC-tp3006649p3006649.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to