Well, yes to all -- that's why I pointed this out. Sorting is (depending on details) >= O(n log(n)) while the quantile algorithm uses R's partial sort option, which I think would be O(n) for a single quantile. For n of a few thousand or so, the difference shouldn't be noticeable. For large n, of course it will be ...
Ties can inherently make any way of doing it ambiguous. order() is a different kettle of fish, as it can simultaneously sort on many columns; so "surprise" that it takes longer seems inappropriate (but perhaps I shouldn't say that, as my comment presumes insight into the psychology of "surprise" that I don't have). -- Bert -----Original Message----- From: macra...@gmail.com [mailto:macra...@gmail.com] On Behalf Of Stavros Macrakis Sent: Tuesday, February 24, 2009 2:12 PM To: Peterko Cc: Bert Gunter; r-help@r-project.org Subject: Re: [R] bigest part of vector On Tue, Feb 24, 2009 at 3:01 PM, Bert Gunter <gunter.ber...@gene.com> wrote: > Nothing wrong with prior suggestions, but strictly speaking, (fully) sorting > the vector is unnecessary. > > y[y > quantile(y, 1- p/length(y))] > > will do it without the (complete) sort. (But sorting is so efficient anyway, > I don't think you could notice any difference). R uses an efficient quantile calculation, so it is significantly faster for large data sets: > big <- rnorm(1e7) > system.time(res<-big[big>=quantile(big,(length(big)-1)/length(big))]) user system elapsed 0.56 0.14 0.70 > system.time(res<-big[big>=quantile(big,(length(big)-100)/length(big))]) user system elapsed 0.75 0.10 0.84 > system.time(res<-big[big>=quantile(big,(length(big)-10000)/length(big))]) user system elapsed 0.61 0.08 0.68 > system.time(res<-big[big>=quantile(big,1/2)]) user system elapsed 1.08 0.08 1.17 > system.time(res<-sort(big)) user system elapsed 4.67 0.03 4.72 > system.time(res<-sort(big)[round(length(big)/2):length(big)]) user system elapsed 4.71 0.10 4.82 Surprisingly, perhaps, "order" is much slower than "sort": > big <- rnorm(1e7) > system.time(res<-order(big)) user system elapsed 21.07 0.05 21.14 And you do need to be careful about your handling of ties: > test <- c(1,2,3,4,4,4) > test[test>=quantile(test,5/6)] [1] 4 4 4 > test[test>=quantile(test,6/6)] [1] 4 4 4 Hope this helps. -s ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.