Thanks for the quick reply Jim. I haven't had any success when I whittle down 'by' list even further though. I believe I'm using the right command, but now it's just a matter of clear memory issues.
> test <- aggregate(lf1.turbot[,17:217], list(lf1.turbot$vessel, lf1.turbot$trip, lf1.turbot$set), sum) Error: cannot allocate vector of size 237.4 Mb In addition: Warning messages: 1: Reached total allocation of 734Mb: see help(memory.size) 2: Reached total allocation of 734Mb: see help(memory.size) 3: Reached total allocation of 734Mb: see help(memory.size) 4: Reached total allocation of 734Mb: see help(memory.size) A fellow kindly emailed me directly and suggested trying Wickham's 'reshape' package, but again when using the melt() function in that package I run into memory problems. A colleague suggested I 'create factors using as.factor() and feed this directly into the appropriate apply function', but I've had no success with this when using tapply(). Any suggestions as to a less memory-intensive procedure would be greatly appreciated. Thanks, -- jared tobin, student research assistant fisheries and oceans canada [EMAIL PROTECTED] -----Original Message----- From: jim holtman [mailto:[EMAIL PROTECTED] Sent: Thursday, September 13, 2007 6:49 PM To: Tobin, Jared Cc: [EMAIL PROTECTED] Subject: Re: [R] Collapsing data frame; aggregate() or better function? The second argument for aggregate is supposed to be a list, so try (notice the missing comma before "1:8"): test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[1:8],sum) On 9/13/07, Tobin, Jared <[EMAIL PROTECTED]> wrote: > Hello r-help, > > I am trying to collapse or aggregate 'some' of a data frame. A very > simplified version of my data frame looks like: > > > tester > trip set num sex lfs1 lfs2 > 1 313 15 5 M 2 3 > 2 313 15 3 F 1 2 > 3 313 17 1 M 0 1 > 4 313 17 2 F 1 1 > 5 313 17 1 U 1 0 > > And I want to omit sex from the picture and just get an addition of > num, lfs1, and lfs2 for each unique trip/set combination. Using > aggregate() works fine here, > > > test <- aggregate(tester[,c(3,5:6)], tester[,1:2], sum) test > trip set num lfs1 lfs2 > 1 313 15 8 3 5 > 2 313 17 4 2 2 > > But I'm having trouble getting the same function to work on my actual > data frame which is considerably larger. > > > dim(lf1.turbot) > [1] 16468 217 > > test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[,1:8], > sum) > Error in vector("list", prod(extent)) : vector size specified is too > large In addition: Warning messages: > 1: NAs produced by integer overflow in: ngroup * (as.integer(index) - > one) > 2: NAs produced by integer overflow in: group + ngroup * > (as.integer(index) - one) > 3: NAs produced by integer overflow in: ngroup * nlevels(index) > > I'm guessing that either aggregate() can't handle a data frame of this > size OR that there is an issue with 'omitting' more than one variable > (in the same way I've omitted sex in the above example). Can anyone > clarify and/or recommend any relatively simple alternative procedure > to accomplish this? > > I plan on trying variants of by() and tapply() tomorrow morning, but > I'm about to head home for the day. > > Thanks, > > -- > > jared tobin, student research assistant fisheries and oceans canada > [EMAIL PROTECTED] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.