Thanks for benchmarking them. data.table is indeed worth looking at.

On Wed, Sep 7, 2011 at 9:55 PM, Dennis Murphy <djmu...@gmail.com> wrote:

> Hi:
>
> Here are a few informal timings on my machine with the following
> example. The data.table package is worth investigating, particularly
> in problems where its advantages can scale with size.
>
> library(data.table)
> dt <- data.table(x = sample(1:50, 1000000, replace = TRUE),
>                  y = sample(letters[1:26], 1000000, replace = TRUE),
>                  key = 'y')
> system.time(dt[, list(count = sum(x)), by = 'y'])
>   user  system elapsed
>   0.02    0.00    0.02
>
> # Data tables are also data frames, so we can use them as such:
>
> system.time(with(dt, tapply(x, y, sum)))
>   user  system elapsed
>   0.39    0.00    0.39
> system.time(with(dt, rowsum(x, y)))
>   user  system elapsed
>   0.04    0.00    0.03
> system.time(aggregate(x ~ y, data = dt, FUN = sum))
>   user  system elapsed
>   1.87    0.00    1.87
>
> So rowsum() is good, but data.table is a little better for this task.
> Increasing the size of the problem is to the advantage of both
> data.table and rowsum(), but tapply() takes a fair bit longer,
> relatively speaking (appx. 10x rowsum() in the first example, 20x in
> the second example). The ratios of rowsum() to data.table are about
> the same (appx. 2x).
>
> # 10M observations, 1000 groups
> > dt <- data.table(x = sample(1:100, 10000000, replace = TRUE),
> +                  y = sample(1:1000, 10000000, replace = TRUE),
> +                  key = 'y')
> > system.time(dt[, list(count = sum(x)), by = 'y'])
>   user  system elapsed
>   0.16    0.03    0.18
> > system.time(with(dt, rowsum(x, y)))
>   user  system elapsed
>   0.36    0.04    0.40
> > system.time(with(dt, tapply(x, y, sum)))
>   user  system elapsed
>   8.77    0.33    9.11
>
> HTH,
> Dennis
>
>
> On Wed, Sep 7, 2011 at 6:18 PM, zhenjiang xu <zhenjiang...@gmail.com>
> wrote:
> > Thanks for all your replies. I am using rowsum() and it looks efficient.
> I
> > hope I could do some benchmark sometime in near future and let people
> know.
> > Or is there any benchmark result available?
> >
> > On Wed, Aug 31, 2011 at 12:58 PM, Bert Gunter <gunter.ber...@gene.com
> >wrote:
> >
> >> Inline below:
> >>
> >> On Wed, Aug 31, 2011 at 9:50 AM, Jorge I Velez <
> jorgeivanve...@gmail.com>
> >> wrote:
> >> > Hi Zhenjiang,
> >> >
> >> > Try
> >> >
> >> > table(unlist(mapply(function(x, y) rep(x, y), y, x)))
> >>
> >> Yikes! How about simply tapply(x,y,sum) ??
> >> ?tapply
> >>
> >> -- Bert
> >> >
> >> > HTH,
> >> > Jorge
> >> >
> >> >
> >> > On Wed, Aug 31, 2011 at 12:45 PM, zhenjiang xu <> wrote:
> >> >
> >> >> Hi R users,
> >> >>
> >> >> suppose I have two vectors,
> >> >>  > x=c(1,2,3,4,5)
> >> >>  > y=c('a','b','c','a','c')
> >> >> How can I get a data.frame like this?
> >> >> > xy
> >> >>      count
> >> >> a     5
> >> >> b     2
> >> >> c     8
> >> >>
> >> >> I know a few ways to fulfill the task. However, I have a huge number
> >> >> of this kind calculations, so I'd like an efficient solution. Thanks
> >> >>
> >> >> --
> >> >> Best,
> >> >> Zhenjiang
> >> >>
> >> >> ______________________________________________
> >> >> R-help@r-project.org mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide
> >> >> http://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained, reproducible code.
> >> >>
> >> >
> >> >        [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-help@r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >
> >>
> >
> >
> >
> > --
> > Best,
> > Zhenjiang
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>



-- 
Best,
Zhenjiang

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to