Thanks Steve, what is the analogue of .N for min and max? i.e., what is the data.table's version of aggregate(infl$delay,by=list(infl$share.id),FUN=min) aggregate(infl$delay,by=list(infl$share.id),FUN=max) thanks! Sam.
On Fri, Sep 14, 2012 at 3:40 PM, Steve Lianoglou <mailinglist.honey...@gmail.com> wrote: > Hi, > > On Fri, Sep 14, 2012 at 3:26 PM, Sam Steingold <s...@gnu.org> wrote: >> I have a large data.frame Z (2,424,185,944 bytes, 10,256,441 rows, 17 >> columns). >> I want to get the result of >> table(aggregate(Z$V1, FUN = length, by = list(id=Z$V2))$x) >> alas, aggregate has been running for ~30 minute, RSS is 14G, VIRT is >> 24.3G, and no end in sight. >> both V1 and V2 are characters (not factors). >> Is there anything I could do to speed this up? >> Thanks. > > You might find you'll get a lot of mileage out of data.table when > working with such large data.frames ... > > To get something close to what you're after, you can try: > > R> library(data.table) > R> Z <- as.data.table(Z) > R> setkeyv(Z, 'V2') > R> agg <- Z[, list(count=.N), by='V2'] > > From here you might > > R> tab1 <- table(agg$count) > > I think that'll get you where you want to be ... I'm ashamed to say > that I haven't really done much w/ aggregate since I mostly have used > plyr and data.table like stuff, so I might be missing your end goal -- > providing a reproducible example with a small data.frame from you can > help here (for me at least). > > HTH, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact -- Sam Steingold <http://sds.podval.org> <http://www.childpsy.net/> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.