Hi,

Just to add my tuppence, which might not even be worth that these days...

I found the following blog post from 2013, which is likely dated to some 
extent, but provided some benchmarks for a few methods:

  
http://rcrastinate.blogspot.com/2013/05/the-rbinding-race-for-vs-docall-vs.html

There is also a comment with a reference there to using the data.table package, 
which I don't use, but may be something to evaluate.

As Bert and Sarah hinted at, there is overhead in taking the repetitive 
piecemeal approach.

If all of your data frames are of the exact same column structure (column 
order, column types), it may be prudent to do your own pre-allocation of a data 
frame that is the target row total size and then "insert" each "sub" data frame 
by using row indexing into the target structure.

Regards,

Marc Schwartz


> On Jun 27, 2016, at 11:54 AM, Witold E Wolski <wewol...@gmail.com> wrote:
> 
> Hi Bert,
> 
> You are most likely right. I just thought that do.call("rbind", is
> somehow more clever and allocates the memory up front. My error. After
> more searching I did find rbind.fill from plyr which seems to do the
> job (it computes the size of the result data.frame and allocates it
> first).
> 
> best
> 
> On 27 June 2016 at 18:49, Bert Gunter <bgunter.4...@gmail.com> wrote:
>> The following might be nonsense, as I have no understanding of R
>> internals; but ....
>> 
>> "Growing" structures in R by iteratively adding new pieces is often
>> warned to be inefficient when the number of iterations is large, and
>> your rbind() invocation might fall under this rubric. If so, you might
>> try  issuing the call say, 20 times, over 10k disjoint subsets of the
>> list, and then rbinding up the 20 large frames.
>> 
>> Again, caveat emptor.
>> 
>> Cheers,
>> Bert
>> 
>> 
>> Bert Gunter
>> 
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> 
>> 
>> On Mon, Jun 27, 2016 at 8:51 AM, Witold E Wolski <wewol...@gmail.com> wrote:
>>> I have a list (variable name data.list) with approx 200k data.frames
>>> with dim(data.frame) approx 100x3.
>>> 
>>> a call
>>> 
>>> data <-do.call("rbind", data.list)
>>> 
>>> does not complete - run time is prohibitive (I killed the rsession
>>> after 5 minutes).
>>> 
>>> I would think that merging data.frame's is a common operation. Is
>>> there a better function (more performant) that I could use?
>>> 
>>> Thank you.
>>> Witold
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Witold Eryk Wolski
>>> 
>>> ______________________________________________
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Witold Eryk Wolski
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to