Hi
I'm trying to manipulate a data frame (that has about 10 million rows) rows by "grouping" it with multiple columns. For example, say the data set looks like: Area Sex Year y Bob F 2011 1 Bob F 2011 2 Bob F 2012 3 Bob M 2012 3 Bob M 2012 2 Fred F 2011 1 Fred F 2011 1 Fred F 2012 2 Fred M 2012 3 Fred M 2012 1 And I want it to look like Area Sex Year Sum of y Bob F 2011 3 Bob F 2012 3 Bob M 2012 5 Fred F 2011 2 Fred F 2012 2 Fred M 2012 4 I think I can use something like: tmp <- aggregate (y ~ ., sum) But due to the size it's really taking a strain on the computer (even with 64-bit R on a, yes unfortunately Windows, machine with 16GB RAM :(). The reason for me wanting the data set to get into this form is I want to then apply the population information and get the "rate" on the "sum of y" column then fit a Poisson regression model. I'm wondering (and would appreciate comments) whether there is a more efficient way to the process I described? Cheers Michael [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.