Try adding a version that uses sort with the partial argument, that should be 
faster than regular sort (for long enough test vectors) and possibly faster 
than the max solutions when finding more than just the largest 2.

Also, for your max solutions, what happens when the 2 largest values are 
identical?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Allan Engelhardt
> Sent: Thursday, June 04, 2009 2:18 AM
> To: r-help@r-project.org
> Subject: [R] Fast way of finding top-n values of a long vector
> 
> If x is a (long) vector and n << length(x), what is a fast way of
> finding the top-n values of x?
> 
> Some suggestions (calculating the ratio of the two top values):
> 
> 
> library("rbenchmark")
> set.seed(1); x <- runif(1e6, max=1e7); x[1] <- NA;
> benchmark(
>  replications=20,
>  columns=c("test","elapsed"),
>  order="elapsed"
>  , sort = {a<-sort(x, decreasing=TRUE, na.last=NA)[1:2]; a[1]/a[2];}
>  , max  = {m<-max(x, na.rm=TRUE); w<-which(x==m)[1]; m/max(x[-w],
> na.rm=TRUE);}
>  , max2 = {w<-which.max(x); max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);}
> )
> #   test elapsed
> # 3 max2   0.772
> # 2  max   1.732
> # 1 sort   4.958
> 
> 
> I want to apply this code to a few tens of thousands of vectors so
> speed
> does matter.  In C or similar I would of course calculate the result
> with a single pass through x, and not with three passes as in 'max2'.
> 
> 
> Allan.
> 
> PS: I know na.last=NA is the default for sort, but there is no harm in
> being explicit in how you want NA's handled.
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to