On Dec 14, 2009, at 12:45 , rfa...@tzi.de wrote:

Full_Name: Raimar Falke
Version: R version 2.10.0 (2009-10-26)
OS: Linux 2.6.27-16-generic #1 SMP Tue Dec 1 19:26:23 UTC 2009 x86_64 GNU/Linux
Submission from: (NULL) (134.102.222.56)


The construction of a data frame in the way shown below requires
much more memory than expected. If we assume a cell value takes 8 bytes
the total amount of the data is 128mb. However the process takes about
920mb and not the expected 256mb (two times the data set).

With the real data sets (~35000 observations with ~33000 attributes) the
conversion to a data frame requires has to be killed at with 60gb of
memory usage while it should only require 17.6gb (2*8.8gb).

 dfn <- rep(list(rep(0, 4096)), 4096)
 test <- as.data.frame.list(dfn)

I also tried the incremental construction of the
data-frame: df$colN <- dataForColN. While I currently can't say much
about the memory usage, it takes a looong time.

After the construction the saved-and-loaded data-frame has the expected size.

What is the recommended way to construct larger data-frames?


Please use R-help for questions, and not the bug tracking system!



There are few issues with your example - mainly because is has no row names and no column names so R will try to create them from the expression which is inherently slow and memory-consuming. So first, make sure you set the names on the list, e.g.:

names(dfn) <- paste("V",seq.int(length(dfn)),sep='')

That will reduce the overhead due to column names. Then what as.data.frame does is to simply call data.frame on the elements of the list. That ensures that all is right, but if you know for sure that your list is valid (correct lengths, valid names, no need for row names etc.) then you can simply assert that it is a data frame:

class(dfn)<-"data.frame"
row.names(dfn)<-NULL

You'll still need double the memory because the object needs to be copied for the attribute modifications, but that's as low as it get -- although in your exact example there is an even more efficient way:

dfn <- rep(data.frame(X=rep(0, 4096)), 4096)
dfn <- do.call("cbind", dfn)

it uses only a fraction more memory than the size of the entire object, but that's for entirely different reasons :). No, it's not good in general :P

Cheers,
Simon

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to