There is a substantial overhead in rbind.dataframe() because of the need to check the column types. Converting to matrix makes a huge difference in speed, but be careful of type coercion.
testdf <- data.frame(matrix(runif(300), nrow=100, ncol=3)) testdf.list <- lapply(1:10000, function(x)testdf) system.time(r.df <- do.call("rbind", testdf.list)) system.time({ testm.list <- lapply(testdf.list, as.matrix) r.m <- do.call("rbind", testm.list) }) > testdf <- data.frame(matrix(runif(300), nrow=100, ncol=3)) > testdf.list <- lapply(1:10000, function(x)testdf) > > system.time(r.df <- do.call("rbind", testdf.list)) user system elapsed 195.105 36.419 231.930 > > system.time({ + testm.list <- lapply(testdf.list, as.matrix) + r.m <- do.call("rbind", testm.list) + }) user system elapsed 0.603 0.009 0.612 Sarah On Mon, Jun 27, 2016 at 11:51 AM, Witold E Wolski <wewol...@gmail.com> wrote: > I have a list (variable name data.list) with approx 200k data.frames > with dim(data.frame) approx 100x3. > > a call > > data <-do.call("rbind", data.list) > > does not complete - run time is prohibitive (I killed the rsession > after 5 minutes). > > I would think that merging data.frame's is a common operation. Is > there a better function (more performant) that I could use? > > Thank you. > Witold > > > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.