Hi: Here are a few informal timings on my machine with the following example. The data.table package is worth investigating, particularly in problems where its advantages can scale with size.
library(data.table) dt <- data.table(x = sample(1:50, 1000000, replace = TRUE), y = sample(letters[1:26], 1000000, replace = TRUE), key = 'y') system.time(dt[, list(count = sum(x)), by = 'y']) user system elapsed 0.02 0.00 0.02 # Data tables are also data frames, so we can use them as such: system.time(with(dt, tapply(x, y, sum))) user system elapsed 0.39 0.00 0.39 system.time(with(dt, rowsum(x, y))) user system elapsed 0.04 0.00 0.03 system.time(aggregate(x ~ y, data = dt, FUN = sum)) user system elapsed 1.87 0.00 1.87 So rowsum() is good, but data.table is a little better for this task. Increasing the size of the problem is to the advantage of both data.table and rowsum(), but tapply() takes a fair bit longer, relatively speaking (appx. 10x rowsum() in the first example, 20x in the second example). The ratios of rowsum() to data.table are about the same (appx. 2x). # 10M observations, 1000 groups > dt <- data.table(x = sample(1:100, 10000000, replace = TRUE), + y = sample(1:1000, 10000000, replace = TRUE), + key = 'y') > system.time(dt[, list(count = sum(x)), by = 'y']) user system elapsed 0.16 0.03 0.18 > system.time(with(dt, rowsum(x, y))) user system elapsed 0.36 0.04 0.40 > system.time(with(dt, tapply(x, y, sum))) user system elapsed 8.77 0.33 9.11 HTH, Dennis On Wed, Sep 7, 2011 at 6:18 PM, zhenjiang xu <zhenjiang...@gmail.com> wrote: > Thanks for all your replies. I am using rowsum() and it looks efficient. I > hope I could do some benchmark sometime in near future and let people know. > Or is there any benchmark result available? > > On Wed, Aug 31, 2011 at 12:58 PM, Bert Gunter <gunter.ber...@gene.com>wrote: > >> Inline below: >> >> On Wed, Aug 31, 2011 at 9:50 AM, Jorge I Velez <jorgeivanve...@gmail.com> >> wrote: >> > Hi Zhenjiang, >> > >> > Try >> > >> > table(unlist(mapply(function(x, y) rep(x, y), y, x))) >> >> Yikes! How about simply tapply(x,y,sum) ?? >> ?tapply >> >> -- Bert >> > >> > HTH, >> > Jorge >> > >> > >> > On Wed, Aug 31, 2011 at 12:45 PM, zhenjiang xu <> wrote: >> > >> >> Hi R users, >> >> >> >> suppose I have two vectors, >> >> > x=c(1,2,3,4,5) >> >> > y=c('a','b','c','a','c') >> >> How can I get a data.frame like this? >> >> > xy >> >> count >> >> a 5 >> >> b 2 >> >> c 8 >> >> >> >> I know a few ways to fulfill the task. However, I have a huge number >> >> of this kind calculations, so I'd like an efficient solution. Thanks >> >> >> >> -- >> >> Best, >> >> Zhenjiang >> >> >> >> ______________________________________________ >> >> R-help@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> > > > > -- > Best, > Zhenjiang > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.