Taby,
First, it is better to reply to the whole list (which I have included on
this reply); there is a better chance of someone helping you. Just
because I could help with one aspect does not mean I necessarily can (or
have the time to) help with more.
Further comments are inline below.
On 3/16/2011 10:45 PM, taby gathoni wrote:
Hi Brian,
Thanks for this comment I will action on this. Thanks also for the
comment, Andrija also advised the same thing and it worked like magic.
My next cause of action was to get the confidence intervals with the
AUC values.
For the confidence intervals i did them manually. for 99% i cut out
first 5 and last 5 after ranking the ACs while for 95% CI i cut out
first 25 and last 25
In general, this is right. The middle 95% excludes the 2.5% on the
ends, so for 1000 samples that is excluding the 25 most extreme values.
and this is my output
Upper bound Lower Bound
at 99% CI 0.8175 0.50125
at 95% CI 0.7775 0.50375
from my understanding because there are small samples of 20 GOOD and
20 BAD the variations in the upper and lower bound should be minimal in
the 1000 samples.
I don't know why you would necessarily expect the variance to be
minimal. It is what it is. Also, I don't know why you took 20 of each
rather than just a random sub-sample.
If you get time, Would you be in a position to assist me find out
why i have such huge variations? Thank you for taking time to respond.
Maybe pull out 10 of your bootstrap samples and look at the ROC curves
themselves and their associated AUC. That might give you a sense as to
the variability that is possible (which is reflected in the confidence
interval).
As a final note, you are reinventing the wheel. There are several
packages that deal with ROC curves. Two I like in particular are ROCR
and pROC. The latter even has built in routines for computing
confidence intervals for the AUC using bootstrap replication.
Kind regards,
Taby
--- On Wed, 3/16/11, Brian Diggs<[email protected]> wrote:
From: Brian Diggs<[email protected]>
Subject: Re: calculating AUCs for each of the 1000 boot strap samples
To: [email protected]
Cc: "R help"<[email protected]>
Date: Wednesday, March 16, 2011, 10:42 PM
On 3/16/2011 8:04 AM, taby gathoni wrote:
data<-data.frame(id=1:(165+42),main_samp$SCORE,
x=rep(c("BAD","GOOD"),c(42,165)))
f<-function(x) {
+ str.sample<-list()
+ for (i in 1:length(levels(x$x)))
+ {
+ str.sample[[i]]<-x[x$x==levels(x$x)[i]
,][sample(tapply(x$x,x$x,length)[i],20,rep=T),]
+ }
+ strat.sample<-do.call("rbind",str.sample)
+ return(strat.sample$main_samp.SCORE)
+ }
f(data)
[1]
706 633 443 843 756 743 730 843 706 730 606 743 768 768 743 763 608 730
743 743 530 813 813 831 793 900 793 693 900 738 706 831
[33] 818 758 718 831 768 638 770 738
repl<-list()
auc<-list()
for(i in 1:1000)
+ {
+ repl[[i]]<-f(data)
+ auc[[i]]<-colAUC(repl[[i]],rep(c("BAD","GOOD")),plotROC=FALSE,alg="ROC")
+ }
Error in
colAUC(repl[[i]], rep(c("BAD", "GOOD")), plotROC = FALSE, alg = "ROC") :
colAUC: length(y) and nrow(X) must be the same Thanks alotTaby
I think (though I can't check because the example is not reproducible without
main_samp$SCORE), that the problem is that the second argument to colAUC should
be
rep(c("BAD", "GOOD"), c(20,20))
The error is that repl[[i]] is length 40 while rep(c("BAD", "GOOD")) is length
2.
P.S. When giving an example, it is better to not include the prompts and
continuation prompts. Copy it from the script rather than the output. Relevant
output can then be included as script comments (prefixed with #). That makes
cutting-and-pasting to test easier.
-- Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health& Science University
--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.