On Dec 14, 2009, at 12:45 , rfa...@tzi.de wrote:
Full_Name: Raimar Falke
Version: R version 2.10.0 (2009-10-26)
OS: Linux 2.6.27-16-generic #1 SMP Tue Dec 1 19:26:23 UTC 2009
x86_64 GNU/Linux
Submission from: (NULL) (134.102.222.56)
The construction of a data frame in the way shown below requires
much more memory than expected. If we assume a cell value takes 8
bytes
the total amount of the data is 128mb. However the process takes about
920mb and not the expected 256mb (two times the data set).
With the real data sets (~35000 observations with ~33000 attributes)
the
conversion to a data frame requires has to be killed at with 60gb of
memory usage while it should only require 17.6gb (2*8.8gb).
dfn <- rep(list(rep(0, 4096)), 4096)
test <- as.data.frame.list(dfn)
I also tried the incremental construction of the
data-frame: df$colN <- dataForColN. While I currently can't say much
about the memory usage, it takes a looong time.
After the construction the saved-and-loaded data-frame has the
expected size.
What is the recommended way to construct larger data-frames?
Please use R-help for questions, and not the bug tracking system!
There are few issues with your example - mainly because is has no row
names and no column names so R will try to create them from the
expression which is inherently slow and memory-consuming. So first,
make sure you set the names on the list, e.g.:
names(dfn) <- paste("V",seq.int(length(dfn)),sep='')
That will reduce the overhead due to column names. Then what
as.data.frame does is to simply call data.frame on the elements of the
list. That ensures that all is right, but if you know for sure that
your list is valid (correct lengths, valid names, no need for row
names etc.) then you can simply assert that it is a data frame:
class(dfn)<-"data.frame"
row.names(dfn)<-NULL
You'll still need double the memory because the object needs to be
copied for the attribute modifications, but that's as low as it get --
although in your exact example there is an even more efficient way:
dfn <- rep(data.frame(X=rep(0, 4096)), 4096)
dfn <- do.call("cbind", dfn)
it uses only a fraction more memory than the size of the entire
object, but that's for entirely different reasons :). No, it's not
good in general :P
Cheers,
Simon
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel