On Sun, 18 Apr 2010, zerdna wrote:
Gabor, Charles, Whit -- i've been walking the woods of R alone so far, and i got to say that your replies to that trivial question are eye-opening experience for me. Gentlemen, what i am trying to say in a roundabout way is that i am extremely grateful and that you guys are frigging awesome. Let me outline the times i am getting for different proposed solutions on the same machine, same data, same version of R x<-rnorm(50000); len<-100 1. my naive roll.rank system.time(x.rank.1<-roll.rank(x,len)) user system elapsed 6.405 0.488 6.94 2. Gabor's zoo z<-zoo(x) system.time(rollapply(z,len, function(x) rank(x)[len])) user system elapsed 6.195 0.361 6.554 3. Charles embed system.time(x.rank <- rowSums(x[ -(1:(len-1)) ] >= embed(x,len) )) user system elapsed 0.181 0.055 0.236 4. Whit's fts dat<-fts(x) system.time(x.rank<-moving.rank(dat, len)) user system elapsed 0.036 0 0.036 5. Charles suggestion with inline, my crude implementation sig<-signature(x="numeric", rank="integer", n="integer", len="integer") code<-"int k=0; for(int i=*len-1; i< *n; i++) {int r=1; for(int j=i-1; j> i-len;j--) r+=(x[i]>x[j] ?1:0); rank[k++]<-r;}" fns<-cfunction(sig,code, convention=".C") system.time( x.rank<-fns(x, numeric(length(x)-len), length(x), len)) user system elapsed 0.011 0 0.011 I guess i could speed it up from time being proportional to length(x)*len to time proportional to length(x)*log(len) if i use slightly more intelligent algo, but this works fine for my requirements. Only thing i really wonder about is why exactly R takes 640 times more than this C code. It would be immensely enlightening if someone could point to an explanation of how execution in R works and where and when it slows down like this.
Well, you can always read the source code. But short of that see ?Rprof then try stuff like this:
x <- rnorm(50000) len <- 100 Rprof() x.rank <- rowSums(x[ -(1:(len-1)) ] >= embed(x,len) ) Rprof(NULL) summaryRprof()
$by.self self.time self.pct total.time total.pct embed 0.10 31.2 0.22 68.8
= 0.08 25.0 0.08 25.0
+ 0.06 18.8 0.06 18.8 - 0.04 12.5 0.04 12.5 rowSums 0.02 6.2 0.32 100.0 rep.int 0.02 6.2 0.02 6.2 inherits 0.00 0.0 0.30 93.8 is.data.frame 0.00 0.0 0.30 93.8 $by.total total.time total.pct self.time self.pct rowSums 0.32 100.0 0.02 6.2 inherits 0.30 93.8 0.00 0.0 is.data.frame 0.30 93.8 0.00 0.0 embed 0.22 68.8 0.10 31.2
= 0.08 25.0 0.08 25.0
+ 0.06 18.8 0.06 18.8 - 0.04 12.5 0.04 12.5 rep.int 0.02 6.2 0.02 6.2 $sampling.time [1] 0.32 HTH, Chuck
-- View this message in context: http://n4.nabble.com/efficient-rolling-rank-tp2013535p2014922.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.