Here is a way that I have used when the data get big. The 'trick' is to create the key (in your case concatenating columns 1:8) and then creating a list of indices (row numbers) of the dataframe that correspond to the grouping (using split). The you have the lapply operate on the list of indices and index into the data to perform the operations.
I created some test data of your size and here was the result: > row <- 16468 > col <- 217 > x <- matrix(sample(1:4, row * col, TRUE), row, col) > x.df <- as.data.frame(x) > # create the indices by concatenating the fields > y <- do.call('paste', x.df[1:8]) > z <- split(seq(nrow(x.df)), y) # create a list of the indices > system.time({ + ans <- lapply(z, function(.rows){ + colSums(x.df[.rows, c(11,12,17:217)]) + }) + }) user system elapsed 147.57 1.15 197.42 > > # combine back into a dataframe > ans <- do.call('rbind', ans) > ans[1:10, 1:7] V11 V12 V17 V18 V19 V20 V21 1 1 1 1 1 1 2 1 1 4 2 2 2 3 3 1 1 1 1 1 1 2 3 4 4 4 2 1 2 2 1 1 1 1 1 1 3 1 1 3 2 2 1 3 2 1 1 1 1 1 1 4 2 1 1 2 4 4 4 2 1 1 1 1 1 1 4 3 1 4 4 3 3 2 4 1 1 1 1 1 2 4 1 2 3 2 1 3 4 1 1 1 1 1 1 2 4 2 2 2 4 3 4 4 3 1 1 1 1 1 2 4 4 2 4 4 3 2 2 3 1 1 1 1 1 3 4 1 3 3 4 4 1 2 1 1 1 1 1 1 3 4 3 1 3 4 3 1 3 2 > Printed out the first couple of rows. The row labels are the concatented values. On 9/14/07, Tobin, Jared <[EMAIL PROTECTED]> wrote: > Thanks for the quick reply Jim. > > I haven't had any success when I whittle down 'by' list even further > though. I believe I'm using the right command, but now it's just a > matter of clear memory issues. > > > test <- aggregate(lf1.turbot[,17:217], list(lf1.turbot$vessel, > lf1.turbot$trip, lf1.turbot$set), sum) > Error: cannot allocate vector of size 237.4 Mb In addition: Warning > messages: > 1: Reached total allocation of 734Mb: see help(memory.size) > 2: Reached total allocation of 734Mb: see help(memory.size) > 3: Reached total allocation of 734Mb: see help(memory.size) > 4: Reached total allocation of 734Mb: see help(memory.size) > > A fellow kindly emailed me directly and suggested trying Wickham's > 'reshape' package, but again when using the melt() function in that > package I run into memory problems. A colleague suggested I 'create > factors using as.factor() and feed this directly into the appropriate > apply function', but I've had no success with this when using tapply(). > > Any suggestions as to a less memory-intensive procedure would be greatly > appreciated. > > Thanks, > > -- > > jared tobin, student research assistant > fisheries and oceans canada > [EMAIL PROTECTED] > > -----Original Message----- > From: jim holtman [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 13, 2007 6:49 PM > To: Tobin, Jared > Cc: [EMAIL PROTECTED] > Subject: Re: [R] Collapsing data frame; aggregate() or better function? > > The second argument for aggregate is supposed to be a list, so try > (notice the missing comma before "1:8"): > > test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[1:8],sum) > > > On 9/13/07, Tobin, Jared <[EMAIL PROTECTED]> wrote: > > Hello r-help, > > > > I am trying to collapse or aggregate 'some' of a data frame. A very > > simplified version of my data frame looks like: > > > > > tester > > trip set num sex lfs1 lfs2 > > 1 313 15 5 M 2 3 > > 2 313 15 3 F 1 2 > > 3 313 17 1 M 0 1 > > 4 313 17 2 F 1 1 > > 5 313 17 1 U 1 0 > > > > And I want to omit sex from the picture and just get an addition of > > num, lfs1, and lfs2 for each unique trip/set combination. Using > > aggregate() works fine here, > > > > > test <- aggregate(tester[,c(3,5:6)], tester[,1:2], sum) test > > trip set num lfs1 lfs2 > > 1 313 15 8 3 5 > > 2 313 17 4 2 2 > > > > But I'm having trouble getting the same function to work on my actual > > data frame which is considerably larger. > > > > > dim(lf1.turbot) > > [1] 16468 217 > > > test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[,1:8], > > sum) > > Error in vector("list", prod(extent)) : vector size specified is too > > large In addition: Warning messages: > > 1: NAs produced by integer overflow in: ngroup * (as.integer(index) - > > one) > > 2: NAs produced by integer overflow in: group + ngroup * > > (as.integer(index) - one) > > 3: NAs produced by integer overflow in: ngroup * nlevels(index) > > > > I'm guessing that either aggregate() can't handle a data frame of this > > > size OR that there is an issue with 'omitting' more than one variable > > (in the same way I've omitted sex in the above example). Can anyone > > clarify and/or recommend any relatively simple alternative procedure > > to accomplish this? > > > > I plan on trying variants of by() and tapply() tomorrow morning, but > > I'm about to head home for the day. > > > > Thanks, > > > > -- > > > > jared tobin, student research assistant fisheries and oceans canada > > [EMAIL PROTECTED] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem you are trying to solve? > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.