Hi there, I think the option of 30 seconds is ok because it is less than each one expent reading the messages :-) Just kiding...
bests milton On Wed, Sep 2, 2009 at 8:01 PM, Leo Alekseyev <dnqu...@gmail.com> wrote: > Thanks everyone for the useful suggestions. The bottleneck might be > memory limitations of my machine (3.2GHz, 2 GB) and the fact that I am > aggregating on a field that is a string. Using the suggested > as.data.frame(table(my.df$my.field)) I do get a speedup, but the > computation still takes 30 seconds. For the sake of comparison, I did > write the "counting up rows with common values" function using a Perl > hash (it's only 5 lines of Perl) and it takes 15 seconds to run -- a > 2x speedup. Not yet sure if it's worth the hassle. > > --Leo > > On Wed, Sep 2, 2009 at 4:28 PM, David M > Smith<da...@revolution-computing.com> wrote: > > You may want to try using isplit (from the iterators package). Combined > with > > foreach, it's an efficient way of iterating through a data frame by > groups > > of rows defined by common values of a columns (which I think is what > you're > > after). You can speed things up further if you have a multiprocessor > system > > with the doMC package to run iterations in parallel. There's an example > > here: > > > http://blog.revolution-computing.com/2009/08/blockprocessing-a-data-frame-with-isplit.html > > Hope this helps, > > # David Smith > > On Wed, Sep 2, 2009 at 3:39 PM, Leo Alekseyev <dnqu...@gmail.com> wrote: > >> > >> I have a data frame with about 10^6 rows; I want to group the data > >> according to entries in one of the columns and do something with it. > >> For instance, suppose I want to count up the number of elements in > >> each group. I tried something like aggregate(my.df$my.field, > >> list(my.df$my.field), length) but it seems to be very slow. Likewise, > >> the split() function was slow (I killed it before it completed). Is > >> there a way to efficiently accomplish this in R?.. I am almost > >> tempted to write an external Perl/Python script entering every row > >> into a hashtable keyed by my.field and iterating over the keys... > >> Might this be faster?.. > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > >> and provide commented, minimal, self-contained, reproducible code. > > > > > > > > -- > > David M Smith <da...@revolution-computing.com> > > Director of Community, REvolution Computing www.revolution-computing.com > > Tel: +1 (206) 577-4778 x3203 (San Francisco, USA) > > > > Check out our upcoming events schedule at > > www.revolution-computing.com/events > > > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.