On Tue, Jun 29, 2010 at 8:02 AM, Matthew Dowle <mdo...@mdowle.plus.com> wrote: > >> dt = data.table(d,key="grp1,grp2") >> system.time(ans1 <- dt[ , list(mean(x),mean(y)) , by=list(grp1,grp2)]) > user system elapsed > 3.89 0.00 3.91 # your 7.064 is 12.23 for me though, so this > 3.9 should be faster for you > > However, Rprof() shows that 3.9 is mostly dispatch of mean to mean.default > which then calls .Internal. Because there are so many groups here, dispatch > bites. > > So ... > >> system.time(ans2 <- dt[ , list(.Internal(mean(x)),.Internal(mean(y))), >> by=list(grp1,grp2)]) > user system elapsed > 0.20 0.00 0.21
Of course, we can perform the same optimisation with ave: fast_mean <- function(x) .Internal(mean(x)) system.time({ d$avx <- ave(d$x, interaction(d$grp1, d$grp2, drop = T), FUN = fast_mean) d$avy <- ave(d$y, interaction(d$grp1, d$grp2, drop = T), FUN = fast_mean) }) # user system elapsed # 3.109 0.188 3.302 Regardless, my point is that there's a simple fix available to make ave much faster, not that it's the fastest thing out there. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.