I have a data frame with about 10^6 rows; I want to group the data according to entries in one of the columns and do something with it. For instance, suppose I want to count up the number of elements in each group. I tried something like aggregate(my.df$my.field, list(my.df$my.field), length) but it seems to be very slow. Likewise, the split() function was slow (I killed it before it completed). Is there a way to efficiently accomplish this in R?.. I am almost tempted to write an external Perl/Python script entering every row into a hashtable keyed by my.field and iterating over the keys... Might this be faster?..
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.