Hi everyone,

Thank you so much for your help.  I see that one can use tricks, such as double 
brackets, and sparing use of gc() to help with memory usage in R, the fact is 
that a new copy of the object is made every time a  named object is assigned, 
and R's garbage collection should take care of "lost objects".  In some cases, 
memory might not go back to the OS depending on the R implementation.

I do think that there might be a memory leak or some "memory release" problem 
within nlme.  I ended up just " chopping up" the loop in subcases and running 
those as separate processes, so that each process' memory use would grow but 
not as much as to hit against memory limits.

Thanks again to Drew, William and Liviu for the suggestions.

Ramiro


________________________________
From: Drew Tyre [aty...@unl.edu]
Sent: Tuesday, April 10, 2012 11:17 AM
To: William Dunlap
Cc: Ramiro Barrantes; r-help@r-project.org
Subject: Re: [R] reclaiming lost memory in R

A few days ago I responded to Ramiro with a suggestion that turns out to be 
incorrect.

> Ramiro
>
> I think the problem is the loop - R doesn't release memory allocated inside
> an expression until the expression completes. A for loop is an expression,
> so it duplicates fit and dataset on every iteration.

The above explanation is not true.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com<http://tibco.com>



My apologies for providing bad advice, and many thanks to Bill for causing me 
to think more deeply about the problem. This sort of repeated simulation task 
is something I do alot, and I hope I understand it better now. I struggled to 
find a good exposition of memory use in R in the documentation - Section 3.3 of 
the "Writing R Extensions" manual, "Profiling R Code for Memory Use" provides 
some hints, but still presumes alot of knowledge on the part of the reader. If 
there is a better description somewhere I'd love to hear about it.

I found the source of my earlier assertion in section 7.1 of "S Programming" by 
Venables and Ripley, on managing loops: "A major issue is that S is designed to 
be able to back out from uncompleted calculations, so that the memory used in 
intermediate calculations is retained  until they are committed. This applies 
to for, while, and repeat loops, for which none of the steps are committed 
until the whole loops is completed." This book was written in 2000, so may be 
out of date, as they suggest later in the same section. In addition, this may 
have applied more to S-Plus engines, rather than R.  I also might have 
misunderstood what Venables and Ripley mean by "committed", it may not have 
anything to do with excessive memory growth inside a loop caused by duplicating 
objects.

The best explanation of what is going on with memory allocation that I found is 
in John Chamber's 2008 book "Software for Data Analysis: Programming with R", 
in section 13.7 "Memory management for R objects". The key point is that 
assigning something to a named object, like fit in Ramiro's example, results in 
a new copy of fit. The reference to the old version of fit is lost, but the 
memory is not deallocated. That only happens once garbage collection is 
triggered, which will happen automatically during the loop. However, triggering 
garbage collection frequently uses up alot of time as well. The other thing 
that I learned is that R has some clever internal programming that detects this 
condition and avoids the worst problems, but only under certain circumstances, 
like using the double square brackets in the assignment.

It is also possible that the OS is unable to release memory that R has given up 
- see 7.42 of the R FAQ. It's not clear on whether this happens on Windows. 
This is what Liviu was referring to in his response, and seems a likely 
candidate for the memory discrepancy in Ramiro's example.

Cheers



> >
> > I have the following situation
> >
> > basic loop which calls memoryHogFunction:
> >
> > for i in (1:N) {
> >    dataset <- generateDataset(i)
> >    fit <- try( memoryHogFunction(dataset, otherParameters))
> > }
> >
> > and within
> >
> > memoryHogFunction <- function(dataset, params){
> >
> >    fit <- try(nlme(someinitialValues)
> >    ...
> >    fit <- try(updatenlme(otherInitialValues)
> >    ...
> >    fit <- try(updatenlme(otherInitialValues)
> >  ...
> >    ret <- fit ( and other things)
> >    return a result "ret"
> > }
> >
> > The problem is that, memoryHogFunction uses a lot of memory, and at the
> > end returns a result (which is not big) but the memory used by the
> > computation seems to be still occupied.  The original loop continues, but
> > the memory used by the program grows and grows after each call to
> > memoryHogFunction.
> >
> > I have been trying to do gc() after each run in the loop, and have even
> > done:
> >
> > in memoryHogFunction()
> >  ...
> >    ret <- fit ( and other things)
> >    rm(list=ls()[-match("ret",ls())])
> >    return a result "ret"
> > }
> >
> > ???
> >
> > A typical results from gc() after each loop iteration says:
> >      used (Mb) gc trigger (Mb) max used (Mb)
> > Ncells  326953 17.5     597831 32.0   597831 32.0
> > Vcells 1645892 12.6    3048985 23.3  3048985 23.3
> >
> > Which doesn't reflect that 340mb (and 400+mb in virtual memory) that are
> > being used right now.
> >
> > Even when I do:
> >
> > print(sapply(ls(all.names=TRUE), function(x) object.size(get(x))))
> >
> > the largest object is 8179808, which is what it should be.
> >
> > THe only thing that looked suspicious was the following within Rprof (with
> > memory=stats option), the tot.duplications might be a problem???:
> >
> > index: "with":"with.default"
> >     vsize.small  max.vsize.small      vsize.large  max.vsize.large
> >           30841            63378            20642           660787
> >           nodes        max.nodes     duplications tot.duplications
> >         3446132          8115016            12395         61431787
> >         samples
> >            4956
> >
> > Any suggestions?  Is it something about the use of loops in R?  Is it
> > maybe the try's???
> >
> > Thanks in advance for any help,
> >
> > Ramiro
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org<mailto:R-help@r-project.org> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Drew Tyre
>
> School of Natural Resources
> University of Nebraska-Lincoln
> 416 Hardin Hall, East Campus
> 3310 Holdrege Street
> Lincoln, NE 68583-0974
>
> phone: +1 402 472 4054<tel:%2B1%20402%20472%204054>
> fax: +1 402 472 2946<tel:%2B1%20402%20472%202946>
> email: aty...@unl.edu<mailto:aty...@unl.edu>
> http://snr.unl.edu/tyre
> http://aminpractice.blogspot.com
> http://www.flickr.com/photos/atiretoo
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org<mailto:R-help@r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Drew Tyre

School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974

phone: +1 402 472 4054<tel:%2B1%20402%20472%204054>
fax: +1 402 472 2946<tel:%2B1%20402%20472%202946>
email: aty...@unl.edu<mailto:aty...@unl.edu>
http://snr.unl.edu/tyre
http://aminpractice.blogspot.com
http://www.flickr.com/photos/atiretoo

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to