At this point R's serialization format only preserves sharing of
environments; any other sharing is lost. Changing this will require an
extensive rewrite of serialization. It would be useful to have this,
especially as we are trying to increase sharing/decrease copying, but
it isn't likely any time soon.
Best,
luke
On Tue, 17 Sep 2013, Ross Boylan wrote:
On Tue, 2013-09-17 at 12:06 -0700, Ross Boylan wrote:
Saving and loading data is roughly doubling memory use. I'm trying to
understand and correct the problem.
Apparently I had the process memories mixed up: R1 below was the one
with 4G and R2 with 2G. So there's less of a mystery. However...
R1 was an R process using just over 2G of memory.
I did save(r3b, r4, sflist, file="r4.rdata")
and then, in a new process R2,
load(file="r4.rdata")
R2 used just under 4G of memory, i.e., almost double the original
process. The r4.rdata file was just under 2G, which seemed like very
little compression.
r4 was created by
r4 <- sflist2stanfit(sflist)
I presume that r4 and sflist shared most of their memory.
The save() apparently lost the information that the memory was shared,
doubling memory use.
Still wondering if this is going on.
R 2.15.1, 64 bit on linux.
First, does my diagnosis sound right? The reports of memory use in R2
are quite a bit lower than the process footprint; is that normal?
gc() # after loading data
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1988691 106.3 3094291 165.3 2432643 130.0
Vcells 266976864 2036.9 282174979 2152.9 268661172 2049.8
rm("r4")
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1949626 104.2 3094291 165.3 2432643 130.0
Vcells 190689777 1454.9 282174979 2152.9 268661172 2049.8
r4 <- sflist2stanfit(sflist)
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1970497 105.3 3094291 165.3 2432643 130.0
Vcells 228827252 1745.9 296363727 2261.1 268661172 2049.8
It seems the recreated r4 used about 300M less memory than the one read
in from disk. This suggests that some of the sharing was lost in the
save/load process.
Even weirder, R1 reports memory use well beyond the memory I show the
process using (2.2G)
Not a mystery after getting the right processes. Actually, I'm a little
surprised the process memory is less than the max used memory; I thought
giving back memory was not possible on Linux.
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 3640941 194.5 5543382 296.1 5543382 296.1
Vcells 418720281 3194.6 553125025 4220.1 526708090 4018.5
Second, what can I do to avoid the problem?
Now a more modest problem, though still a problem.
I guess in this case I could not save r4 and recreate it, but is there a
more general solution?
If I did myboth <- list(r4, sflist) and
save(myboth, file="myfile")
would that be enough to keep the objects together? Judging from the
size of the file, it seems not.
Even if the myboth trick worked it seems like a kludge.
Ross Boylan
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.