Thanks for benchmarking them. data.table is indeed worth looking at. On Wed, Sep 7, 2011 at 9:55 PM, Dennis Murphy <djmu...@gmail.com> wrote:
> Hi: > > Here are a few informal timings on my machine with the following > example. The data.table package is worth investigating, particularly > in problems where its advantages can scale with size. > > library(data.table) > dt <- data.table(x = sample(1:50, 1000000, replace = TRUE), > y = sample(letters[1:26], 1000000, replace = TRUE), > key = 'y') > system.time(dt[, list(count = sum(x)), by = 'y']) > user system elapsed > 0.02 0.00 0.02 > > # Data tables are also data frames, so we can use them as such: > > system.time(with(dt, tapply(x, y, sum))) > user system elapsed > 0.39 0.00 0.39 > system.time(with(dt, rowsum(x, y))) > user system elapsed > 0.04 0.00 0.03 > system.time(aggregate(x ~ y, data = dt, FUN = sum)) > user system elapsed > 1.87 0.00 1.87 > > So rowsum() is good, but data.table is a little better for this task. > Increasing the size of the problem is to the advantage of both > data.table and rowsum(), but tapply() takes a fair bit longer, > relatively speaking (appx. 10x rowsum() in the first example, 20x in > the second example). The ratios of rowsum() to data.table are about > the same (appx. 2x). > > # 10M observations, 1000 groups > > dt <- data.table(x = sample(1:100, 10000000, replace = TRUE), > + y = sample(1:1000, 10000000, replace = TRUE), > + key = 'y') > > system.time(dt[, list(count = sum(x)), by = 'y']) > user system elapsed > 0.16 0.03 0.18 > > system.time(with(dt, rowsum(x, y))) > user system elapsed > 0.36 0.04 0.40 > > system.time(with(dt, tapply(x, y, sum))) > user system elapsed > 8.77 0.33 9.11 > > HTH, > Dennis > > > On Wed, Sep 7, 2011 at 6:18 PM, zhenjiang xu <zhenjiang...@gmail.com> > wrote: > > Thanks for all your replies. I am using rowsum() and it looks efficient. > I > > hope I could do some benchmark sometime in near future and let people > know. > > Or is there any benchmark result available? > > > > On Wed, Aug 31, 2011 at 12:58 PM, Bert Gunter <gunter.ber...@gene.com > >wrote: > > > >> Inline below: > >> > >> On Wed, Aug 31, 2011 at 9:50 AM, Jorge I Velez < > jorgeivanve...@gmail.com> > >> wrote: > >> > Hi Zhenjiang, > >> > > >> > Try > >> > > >> > table(unlist(mapply(function(x, y) rep(x, y), y, x))) > >> > >> Yikes! How about simply tapply(x,y,sum) ?? > >> ?tapply > >> > >> -- Bert > >> > > >> > HTH, > >> > Jorge > >> > > >> > > >> > On Wed, Aug 31, 2011 at 12:45 PM, zhenjiang xu <> wrote: > >> > > >> >> Hi R users, > >> >> > >> >> suppose I have two vectors, > >> >> > x=c(1,2,3,4,5) > >> >> > y=c('a','b','c','a','c') > >> >> How can I get a data.frame like this? > >> >> > xy > >> >> count > >> >> a 5 > >> >> b 2 > >> >> c 8 > >> >> > >> >> I know a few ways to fulfill the task. However, I have a huge number > >> >> of this kind calculations, so I'd like an efficient solution. Thanks > >> >> > >> >> -- > >> >> Best, > >> >> Zhenjiang > >> >> > >> >> ______________________________________________ > >> >> R-help@r-project.org mailing list > >> >> https://stat.ethz.ch/mailman/listinfo/r-help > >> >> PLEASE do read the posting guide > >> >> http://www.R-project.org/posting-guide.html > >> >> and provide commented, minimal, self-contained, reproducible code. > >> >> > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > R-help@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > >> > > >> > > > > > > > > -- > > Best, > > Zhenjiang > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > -- Best, Zhenjiang [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.