If x is a (long) vector and n << length(x), what is a fast way of finding the top-n values of x?

Some suggestions (calculating the ratio of the two top values):


library("rbenchmark")
set.seed(1); x <- runif(1e6, max=1e7); x[1] <- NA;
benchmark(
replications=20,
columns=c("test","elapsed"),
order="elapsed"
, sort = {a<-sort(x, decreasing=TRUE, na.last=NA)[1:2]; a[1]/a[2];}
, max = {m<-max(x, na.rm=TRUE); w<-which(x==m)[1]; m/max(x[-w], na.rm=TRUE);}
, max2 = {w<-which.max(x); max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);}
)
#   test elapsed
# 3 max2   0.772
# 2  max   1.732
# 1 sort   4.958


I want to apply this code to a few tens of thousands of vectors so speed does matter. In C or similar I would of course calculate the result with a single pass through x, and not with three passes as in 'max2'.


Allan.

PS: I know na.last=NA is the default for sort, but there is no harm in being explicit in how you want NA's handled.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to