Re: [R] performance of do.call("rbind")

Jeff Newmiller Mon, 27 Jun 2016 10:03:43 -0700

Your description of the data frames as "approx" puts the solution to 
considerable difficulty and speed penalty. If you want better performance you 
need a better handle on the data you are working with.

For example, if you knew that every data frame had exactly three columns named 
identically and exactly 100 rows, then you could preallocate the result data 
frame and loop through the input data copying values directly to the 
appropriate destination locations in the result. 

To the extent that you can figure out things like the union of all column names 
or the total number of rows prior to starting copying data, you can adapt the 
above approach even if the input data frames are not identical. The key is not 
having to restructure/reallocate your result data frame as you go. 

The bind_rows function in the dplyr package can do a lot of this for you... but 
being a general-purpose function it may not be as optimized as you could do 
yourself with better knowledge of your data. 
-- 
Sent from my phone. Please excuse my brevity.

On June 27, 2016 8:51:17 AM PDT, Witold E Wolski <wewol...@gmail.com> wrote:
>I have a list (variable name data.list) with approx 200k data.frames
>with dim(data.frame) approx 100x3.
>
>a call
>
>data <-do.call("rbind", data.list)
>
>does not complete - run time is prohibitive (I killed the rsession
>after 5 minutes).
>
>I would think that merging data.frame's is a common operation. Is
>there a better function (more performant) that I could use?
>
>Thank you.
>Witold

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] performance of do.call("rbind")

Reply via email to