On Nov 20, 2013, at 12:44 PM, Jack Luo wrote: > Hi, > > I am using the AUCRF package for my data and I was firstly impressed by the > high performance of OOB-AUC. But after a while, I feel it might be due to > some sort of bias, which motivates me to use random data (generated using > rnorm) for a test. > > The design is very simple: 100 observations with 50 in class 0 and 50 in > class 1. The number of variables is something I tuned (the main idea is > that if there is bias, the performance should increase with more > variables). > > Presumably, there is no signal in the data and the true unbiased AUC should > not be too different from 0.5. > > The results are worrisome to me: the OOB AUC is a lot higher than 0.5, and > with more variables, it gets even higher. > > Am I misunderstanding anything here? > > Below is the R code I used to test: > > Nvar = 200 # number of variables > Label = as.factor(c(rep(0,50),rep(1,50))) # class label > AUC_r = NULL > > for (k in 1:10) { # control the randomness of generating random data > set.seed(k) > Arandom = matrix(rnorm(Nvar*length(Label)),nc = Nvar) > DF = data.frame(Arandom,Label = Label) > for (j in 1:20) { # control the randomness of OOB > if (j %% 10 == 0) {cat(k,j,"\n")} > set.seed(j) > fit <- AUCRF(Label~., data=DF) > AUC_r = cbind(AUC_r,fit$AUCcurve$AUC) > } > } > > plot(fit$AUCcurve$k,apply(AUC_r,1,mean),type = "b",pch = 3,xlab = "# of > Vars", lwd = 2, col = 2,ylab = "OOB-AUC",ylim = c(0.4,1)) >
Shouldn't this question go to the package maintainer before being sent to Rhelp? > > Thanks, > > -Jack > > [[alternative HTML version deleted]] And: > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html -- David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.