On Fri, Jun 5, 2009 at 4:09 AM, Allan Engelhardt <all...@cybaea.com> wrote:
> I'm all done now. The "max2" version below is what I went with in the end > for my proposed change to caret::nearZeroVar (which used the "sort" method). > Max Kuhn will make it available on CRAN soon. It speeds up that routine by > a factor 2-5 on my test cases and uses much less memory. You can save a little in max2 like this: max2a = {w<-which.max(x); x[w]/max(x[-w], na.rm=TRUE);} If you don't need to handle NA's (or if you know a priori how many there are), you can also speed up part: parta = {sel <- length(x)+c(-1,0); a<-sort.int(x, partial=sel, na.last=NA)[2:1]; a[1]/a[2];} which becomes about as fast as max2. > library("rbenchmark") set.seed(1); x <- runif(1e7, max=1e8); benchmark( replications=20, columns=c("test","elapsed"), order="elapsed" , sort = {a<-sort(x, decreasing=TRUE, na.last=NA)[1:2]; a[1]/a[2];} , qsrt = {a<-sort(x, decreasing=TRUE, na.last=NA, method="quick")[1:2]; a[1]/a[2];} , part = {a<-sort.int(-x, partial=1:2, na.last=NA)[1:2]; a[1]/a[2];} , parta = {end<-length(x)+c(-1,0); a<-sort.int(x, partial=end, na.last=FALSE)[end]; a[1]/a[2]; } , max1 = {m<-max(x, na.rm=TRUE); w<-which(x==m)[1]; m/max(x[-w],na.rm=TRUE);} , max2 = {w<-which.max(x); max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);} , max2a = {w<-which.max(x); x[w]/max(x[-w], na.rm=TRUE);} ) test elapsed 7 max2a 7.80 6 max2 8.94 4 parta 9.05 3 part 10.72 5 max1 20.21 2 qsrt 49.33 1 sort 94.18 For what it is worth, I also made a C version ("cmax" below) which of > course is faster yet again and scales nicely for returning the top n values > of the array: > > cmax <- function (v) {max <- vector("double",2); max <- .C("test", > as.double(v), as.integer(length(v)), max, NAOK=TRUE)[[3]]; > return(max[1]/max[2]);} > > library("rbenchmark") > set.seed(1); x <- runif(1e7, max=1e8); x[1] <- NA; > benchmark( > replications=20, > columns=c("test","elapsed"), > order="elapsed" > , sort = {a<-sort(x, decreasing=TRUE, na.last=NA)[1:2]; a[1]/a[2];} > , qsrt = {a<-sort(x, decreasing=TRUE, na.last=NA, method="quick")[1:2]; > a[1]/a[2];} > , part = {a<-sort.int(-x, partial=1:2, na.last=NA)[1:2]; a[1]/a[2];} > , max1 = {m<-max(x, na.rm=TRUE); w<-which(x==m)[1]; > m/max(x[-w],na.rm=TRUE);} > , max2 = {w<-which.max(x); max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);} > , cmax = {cmax(x);} > ) > # test elapsed > # 6 cmax 4.394 > # 5 max2 8.954 > # 4 max1 18.835 > # 3 part 21.749 > # 2 qsrt 46.692 > # 1 sort 77.679 > > Thanks for all the suggestions and comments. > > Allan. > > > PS: Slightly off-topic but is there a way within the syntax of R to set up > things so that 'sort' (or any function) would know it is called in a partial > list context in sort(x)[1:2] and it therefore could choose to use the > "partial" argument automatically for small [] lists? The R interpreter of > course knows full well that it is going to drop all but the first two values > of the result before it calls 'sort'. Perl has 'use Want' where howmany() > and want(n) provides a subset of this functionality (essentially for [] > lists of the form 1:n). > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.