I usually get better results with data.table except for this situation. If I take another example unrelated to the current topic:
set.seed(1254) name<- sample(letters,1e6,replace=TRUE) number<- sample(1:10,1e6,replace=TRUE) datTest<- data.frame(name,number,stringsAsFactors=FALSE) system.time(res1<-aggregate(number~name,data=datTest,sum)) # user system elapsed # 1.332 0.004 1.384 dtTest<- data.table(datTest) system.time(res3<- dtTest[,list(Sum_Number=sum(number)),by=name]) # user system elapsed # 0.052 0.000 0.051 res3New<- res3[order(name),] names(res1)<-names(res3New) identical(res1,as.data.frame(res3New)) #[1] TRUE A.K. ----- Original Message ----- From: Steve Lianoglou <lianoglou.st...@gene.com> To: arun <smartpink...@yahoo.com> Cc: R help <r-help@r-project.org> Sent: Thursday, August 15, 2013 4:48 PM Subject: Re: [R] How to extract last value in each group Hi, On Thu, Aug 15, 2013 at 1:38 PM, arun <smartpink...@yahoo.com> wrote: > I tried it again on a fresh start using the data.table alone: > Now. > > dt1 <- data.table(dat2, key=c('Date', 'Time')) > system.time(ans <- dt1[, .SD[.N], by='Date']) > # user system elapsed > # 40.908 0.000 40.981 > #Then tried: > system.time(res7<- dat2[cumsum(rle(dat2[,1])$lengths),]) > # user system elapsed > # 0.148 0.000 0.151 #same time as before Amazing. This is what I get on my MacBook Pro, i7 @ 3GHz (very close specs to your machine): R> dt1 <- data.table(dat2, key=c('Date', 'Time')) R> system.time(ans <- dt1[, .SD[.N], by='Date']) user system elapsed 0.064 0.009 0.073 R> system.time(res7<- dat2[cumsum(rle(dat2[,1])$lengths),]) user system elapsed 0.148 0.016 0.165 On one of our compute server running who knows what processor on some version of linux, but shouldn't really matter as we're talking relative time to each other here: R> system.time(ans <- dt1[, .SD[.N], by='Date']) user system elapsed 0.160 0.012 0.170 R> system.time(res7<- dat2[cumsum(rle(dat2[,1])$lengths),]) user system elapsed 0.292 0.004 0.294 There's got to be some other explanation for the heavily degraded performance you're observing... our R & data.table versions also match. -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.