Hi

 

I'm trying to manipulate a data frame (that has about 10 million rows) rows
by "grouping" it with multiple columns. For example, say the data set looks
like:


Area

Sex

Year

y


Bob

F

2011

1


Bob

F

2011

2


Bob

F

2012

3


Bob

M

2012

3


Bob

M

2012

2


Fred

F

2011

1


Fred

F

2011

1


Fred

F

2012

2


Fred

M

2012

3


Fred

M

2012

1

 

And I want it to look like


Area

Sex

Year

Sum of y


Bob

F

2011

3


Bob

F

2012

3


Bob

M

2012

5


Fred

F

2011

2


Fred

F

2012

2


Fred

M

2012

4

 

I think I can use something like:

tmp <- aggregate (y ~ ., sum)

 

But due to the size it's really taking a strain on the computer (even with
64-bit R on a, yes unfortunately Windows, machine with 16GB RAM :().  The
reason for me wanting the data set to get into this form is I want to then
apply the population information and get the "rate" on the "sum of y" column
then fit a Poisson regression model.

 

I'm wondering (and would appreciate comments) whether there is a more
efficient way to the process I described? 

 

Cheers

Michael

 

 

 

 

 


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to