It's the inverse problem to merging a list of data.frames into a large data.frame just discussed in the "performance of do.call("rbind")" thread
I would like to split a data.frame into a list of data.frames according to first column. This SEEMS to be easily possible with the function base::by. However, as soon as the data.frame has a few million rows this function CAN NOT BE USED (except you have A PLENTY OF TIME). for 'by' runtime ~ nrow^2, or formally O(n^2) (see benchmark below). So basically I am looking for a similar function with better complexity. > nrows <- c(1e5,1e6,2e6,3e6,5e6) > timing <- list() > for(i in nrows){ + dum <- peaks[1:i,] + timing[[length(timing)+1]] <- system.time(x<- by(dum[,2:3], INDICES=list(dum[,1]), FUN=function(x){x}, simplify = FALSE)) + } > names(timing)<- nrows > timing $`1e+05` user system elapsed 0.05 0.00 0.05 $`1e+06` user system elapsed 1.48 2.98 4.46 $`2e+06` user system elapsed 7.25 11.39 18.65 $`3e+06` user system elapsed 16.15 25.81 41.99 $`5e+06` user system elapsed 43.22 74.72 118.09 -- Witold Eryk Wolski ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.