Re: [R] Grouping data in a data frame: is there an efficient way to do it?

milton ruser Wed, 02 Sep 2009 18:12:09 -0700

Hi there,

I think the option of 30 seconds is ok because it is less than each one
expent reading the messages :-) Just kiding...


bests

milton

On Wed, Sep 2, 2009 at 8:01 PM, Leo Alekseyev <dnqu...@gmail.com> wrote:

> Thanks everyone for the useful suggestions.  The bottleneck might be
> memory limitations of my machine (3.2GHz, 2 GB) and the fact that I am
> aggregating on a field that is a string.  Using the suggested
> as.data.frame(table(my.df$my.field)) I do get a speedup, but the
> computation still takes 30 seconds.  For the sake of comparison, I did
> write the "counting up rows with common values" function using a Perl
> hash (it's only 5 lines of Perl) and it takes 15 seconds to run -- a
> 2x speedup.  Not yet sure if it's worth the hassle.
>
> --Leo
>
> On Wed, Sep 2, 2009 at 4:28 PM, David M
> Smith<da...@revolution-computing.com> wrote:
> > You may want to try using isplit (from the iterators package). Combined
> with
> > foreach, it's an efficient way of iterating through a data frame by
> groups
> > of rows defined by common values of a columns (which I think is what
> you're
> > after). You can speed things up further if you have a multiprocessor
> system
> > with the doMC package to run iterations in parallel. There's an example
> > here:
> >
> http://blog.revolution-computing.com/2009/08/blockprocessing-a-data-frame-with-isplit.html
> > Hope this helps,
> > # David Smith
> > On Wed, Sep 2, 2009 at 3:39 PM, Leo Alekseyev <dnqu...@gmail.com> wrote:
> >>
> >> I have a data frame with about 10^6 rows; I want to group the data
> >> according to entries in one of the columns and do something with it.
> >> For instance, suppose I want to count up the number of elements in
> >> each group.  I tried something like aggregate(my.df$my.field,
> >> list(my.df$my.field), length) but it seems to be very slow.  Likewise,
> >> the split() function was slow (I killed it before it completed).  Is
> >> there a way to efficiently accomplish this in R?..  I am almost
> >> tempted to write an external Perl/Python script entering every row
> >> into a hashtable keyed by my.field and iterating over the keys...
> >> Might this be faster?..
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > David M Smith <da...@revolution-computing.com>
> > Director of Community, REvolution Computing www.revolution-computing.com
> > Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)
> >
> > Check out our upcoming events schedule at
> > www.revolution-computing.com/events
> >
> >
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping data in a data frame: is there an efficient way to do it?

Reply via email to