Hello Eik, Thanks very much for your response and for directing me to a useful explanation.
To make sure I am grasping your answer correctly, the two problems I was experiencing are related to the fact that the floating point numbers I was calculating and using in subsequent indices were not entirely equivalent, even if they were calculated using the same function (quantile). I confirmed this as follows: j <- 0.055 # bins value for 5.5% threshold bin.ct <- as.numeric(as.character(quantile(x, j, na.rm = TRUE))) bin.ct #430.29 bin.ct2 <- quantile(x, j, na.rm = TRUE) bin.ct2 # 5.5% #430.29 bin.ct - bin.ct2 # 5.5% #5.684342e-14 length(x) - length(x[x <= bin.ct]) length(x) - length(x[x <= bin.ct]) # 2875 length(x) - length(x[x <= bin.ct2]) # 3029 Testing the unname() option does not fix the result, I should however note: bin.ct <- as.numeric(as.character(quantile(x, j, na.rm = TRUE))) bin.ct2 <- unname(quantile(x, j, na.rm = TRUE)) bin.ct - bin.ct2 # 5.684342e-14 But rounding, as an alternative approach, works: bin.ct2 <- round(quantile(x, j, na.rm = TRUE), 2) bin.ct - bin.ct2 #5.5% # 0 length(x) - length(x[x <= bin.ct]) # 2875 length(x) - length(x[x <= bin.ct2]) # 2875 As to my code, it is part of a custom ROC function I created a while back and started using again recently (the data are rainfall values). I can't remember why I did this rather than using one of the existing ROC functions, but I thought (probably incorrectly) that I had some compelling reason. In any case, it is quite unwieldy, so I will explore those other packages, or try revise this to be more efficient (e.g. maybe this is a better approach, although the answers are fairly different?). x2 <- x[order(x)] y2 <- y[order(y)] bins <- round(seq(min(x2), max(x2), by = diff(range(x2)) / 200), 2) threshold <- seq(0, 100, by = 0.5) tp <- rep(0, length(bins)) fp <- rep(0, length(bins)) for(i in 1:length(threshold)) { tp[i] <- length(x2) - length(x2[x2 <= bins[i]]) fp[i] <- length(y2) - length(y2[y2 <= bins[i]]) } ctch <- cbind(threshold, bins, tp, fp) ctch[1:20, ] Thanks again for your help. Cheers, Lyndon On Tue, Jul 12, 2011 at 5:09 AM, Eik Vettorazzi <e.vettora...@uke.uni-hamburg.de> wrote: > Hi, > > Am 11.07.2011 22:57, schrieb Lyndon Estes: >> ctch[ctch$threshold == 3.5, ] >> # [1] threshold val tp fp tn fn tpr >> fpr tnr fnr >> #<0 rows> (or 0-length row.names) > > this is the very effective FAQ 7.31 trap. > http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f > > Welcome to the first circle of Patrick Burns' R Inferno! > > Also, unname() is a more intuitive way of removing names. > > And I think your code is quite inefficient, because you calculate > quantiles many times, which involves repeated ordering of x, and you may > use a inefficient size of bin (either to small and therefore calculating > the same split many times or to large and then missing some splits). > I'm a bit puzzled what is x and y in your code, so any further advise is > vague but you might have a look at any package that calculates > ROC-curves such as ROCR or pROC (and many more). > > Hth > > -- > Eik Vettorazzi > > Department of Medical Biometry and Epidemiology > University Medical Center Hamburg-Eppendorf > > Martinistr. 52 > 20246 Hamburg > > T ++49/40/7410-58243 > F ++49/40/7410-57790 > -- Lyndon Estes Research Associate Woodrow Wilson School Princeton University les...@princeton.edu ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.