Dear R community,

I am still struggling a bit on how R does memory allocation and how to optimize my code to minimize 
working memory load. Simon (thanks!) and others gave me a hint to use the command "gc()" 
to clean up memory which works quite nice but appears to me to be more like a "fix" to a 
problem.

To give you an impression of what I am talking, here is a short code example + 
I will give rough measure (system track app) of my working memory needed for 
each computational step (R64bit latest version on WIN 7 64 bit system, 2 Cores, 
approx 4 GB Ram):

##########################

# example 1:

y= matrix(rep(1,50000000), nrow = 50000000/2 , ncol = 2)

# used working memory increases from 1044 -->  1808 MB

# (same command again, i.e.)

y= matrix(rep(1,50000000), nrow = 50000000/2 , ncol = 2)

# 1808 MB -->  2178 MB Why does memory increase?

# (give the matrix column names)

colnames(y) = c("col1", "col2")

# 2178 MB -->  1781 MB Why does the size of an object decrease if I assign 
column labels?

###

# example 2:

y= matrix(rep(1,50000000), nrow = 50000000/2 , ncol = 2)

1016 -->  1780 MB

y = data.frame(y)

# increase from 1780 MB -->  3315 MB

##########################

Why does it take so much extra memory to store this matrix as a data.frame?

It is not the object per se (i.e. that data.frames need more memory) because if 
I use gc() memory size drops to 1387 MB. Does this mean that it may be more 
memory-efficient not to use any data.frames but matrices only? etc.

This puzzles me a lot. From my experience these effects are also accentuated 
for larger objects.

As an anecdotal comparison: I also used Stata in my last project due to these 
memory problems and I could do a lot of variable manipulations of the same (!) 
data with significant (I am talking about GB) less memory needed.

Best,

Marc

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to