On Tue, Jun 29, 2010 at 8:02 AM, Matthew Dowle <mdo...@mdowle.plus.com> wrote:
>
>> dt = data.table(d,key="grp1,grp2")
>> system.time(ans1 <- dt[ , list(mean(x),mean(y)) , by=list(grp1,grp2)])
>   user  system elapsed
>   3.89    0.00    3.91        # your 7.064 is 12.23 for me though, so this
> 3.9 should be faster for you
>
> However, Rprof() shows that 3.9 is mostly dispatch of mean to mean.default
> which then calls .Internal.  Because there are so many groups here, dispatch
> bites.
>
> So ...
>
>> system.time(ans2 <- dt[ , list(.Internal(mean(x)),.Internal(mean(y))),
>> by=list(grp1,grp2)])
>   user  system elapsed
>   0.20    0.00    0.21

Of course, we can perform the same optimisation with ave:

fast_mean <- function(x) .Internal(mean(x))
system.time({
  d$avx <- ave(d$x, interaction(d$grp1, d$grp2, drop = T), FUN = fast_mean)
  d$avy <- ave(d$y, interaction(d$grp1, d$grp2, drop = T), FUN = fast_mean)
})
#  user  system elapsed
# 3.109   0.188   3.302

Regardless, my point is that there's a simple fix available to make
ave much faster, not that it's the fastest thing out there.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to