Hi everyone, Thank you so much for your help. I see that one can use tricks, such as double brackets, and sparing use of gc() to help with memory usage in R, the fact is that a new copy of the object is made every time a named object is assigned, and R's garbage collection should take care of "lost objects". In some cases, memory might not go back to the OS depending on the R implementation.
I do think that there might be a memory leak or some "memory release" problem within nlme. I ended up just " chopping up" the loop in subcases and running those as separate processes, so that each process' memory use would grow but not as much as to hit against memory limits. Thanks again to Drew, William and Liviu for the suggestions. Ramiro ________________________________ From: Drew Tyre [aty...@unl.edu] Sent: Tuesday, April 10, 2012 11:17 AM To: William Dunlap Cc: Ramiro Barrantes; r-help@r-project.org Subject: Re: [R] reclaiming lost memory in R A few days ago I responded to Ramiro with a suggestion that turns out to be incorrect. > Ramiro > > I think the problem is the loop - R doesn't release memory allocated inside > an expression until the expression completes. A for loop is an expression, > so it duplicates fit and dataset on every iteration. The above explanation is not true. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com<http://tibco.com> My apologies for providing bad advice, and many thanks to Bill for causing me to think more deeply about the problem. This sort of repeated simulation task is something I do alot, and I hope I understand it better now. I struggled to find a good exposition of memory use in R in the documentation - Section 3.3 of the "Writing R Extensions" manual, "Profiling R Code for Memory Use" provides some hints, but still presumes alot of knowledge on the part of the reader. If there is a better description somewhere I'd love to hear about it. I found the source of my earlier assertion in section 7.1 of "S Programming" by Venables and Ripley, on managing loops: "A major issue is that S is designed to be able to back out from uncompleted calculations, so that the memory used in intermediate calculations is retained until they are committed. This applies to for, while, and repeat loops, for which none of the steps are committed until the whole loops is completed." This book was written in 2000, so may be out of date, as they suggest later in the same section. In addition, this may have applied more to S-Plus engines, rather than R. I also might have misunderstood what Venables and Ripley mean by "committed", it may not have anything to do with excessive memory growth inside a loop caused by duplicating objects. The best explanation of what is going on with memory allocation that I found is in John Chamber's 2008 book "Software for Data Analysis: Programming with R", in section 13.7 "Memory management for R objects". The key point is that assigning something to a named object, like fit in Ramiro's example, results in a new copy of fit. The reference to the old version of fit is lost, but the memory is not deallocated. That only happens once garbage collection is triggered, which will happen automatically during the loop. However, triggering garbage collection frequently uses up alot of time as well. The other thing that I learned is that R has some clever internal programming that detects this condition and avoids the worst problems, but only under certain circumstances, like using the double square brackets in the assignment. It is also possible that the OS is unable to release memory that R has given up - see 7.42 of the R FAQ. It's not clear on whether this happens on Windows. This is what Liviu was referring to in his response, and seems a likely candidate for the memory discrepancy in Ramiro's example. Cheers > > > > I have the following situation > > > > basic loop which calls memoryHogFunction: > > > > for i in (1:N) { > > dataset <- generateDataset(i) > > fit <- try( memoryHogFunction(dataset, otherParameters)) > > } > > > > and within > > > > memoryHogFunction <- function(dataset, params){ > > > > fit <- try(nlme(someinitialValues) > > ... > > fit <- try(updatenlme(otherInitialValues) > > ... > > fit <- try(updatenlme(otherInitialValues) > > ... > > ret <- fit ( and other things) > > return a result "ret" > > } > > > > The problem is that, memoryHogFunction uses a lot of memory, and at the > > end returns a result (which is not big) but the memory used by the > > computation seems to be still occupied. The original loop continues, but > > the memory used by the program grows and grows after each call to > > memoryHogFunction. > > > > I have been trying to do gc() after each run in the loop, and have even > > done: > > > > in memoryHogFunction() > > ... > > ret <- fit ( and other things) > > rm(list=ls()[-match("ret",ls())]) > > return a result "ret" > > } > > > > ??? > > > > A typical results from gc() after each loop iteration says: > > used (Mb) gc trigger (Mb) max used (Mb) > > Ncells 326953 17.5 597831 32.0 597831 32.0 > > Vcells 1645892 12.6 3048985 23.3 3048985 23.3 > > > > Which doesn't reflect that 340mb (and 400+mb in virtual memory) that are > > being used right now. > > > > Even when I do: > > > > print(sapply(ls(all.names=TRUE), function(x) object.size(get(x)))) > > > > the largest object is 8179808, which is what it should be. > > > > THe only thing that looked suspicious was the following within Rprof (with > > memory=stats option), the tot.duplications might be a problem???: > > > > index: "with":"with.default" > > vsize.small max.vsize.small vsize.large max.vsize.large > > 30841 63378 20642 660787 > > nodes max.nodes duplications tot.duplications > > 3446132 8115016 12395 61431787 > > samples > > 4956 > > > > Any suggestions? Is it something about the use of loops in R? Is it > > maybe the try's??? > > > > Thanks in advance for any help, > > > > Ramiro > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org<mailto:R-help@r-project.org> mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Drew Tyre > > School of Natural Resources > University of Nebraska-Lincoln > 416 Hardin Hall, East Campus > 3310 Holdrege Street > Lincoln, NE 68583-0974 > > phone: +1 402 472 4054<tel:%2B1%20402%20472%204054> > fax: +1 402 472 2946<tel:%2B1%20402%20472%202946> > email: aty...@unl.edu<mailto:aty...@unl.edu> > http://snr.unl.edu/tyre > http://aminpractice.blogspot.com > http://www.flickr.com/photos/atiretoo > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org<mailto:R-help@r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Drew Tyre School of Natural Resources University of Nebraska-Lincoln 416 Hardin Hall, East Campus 3310 Holdrege Street Lincoln, NE 68583-0974 phone: +1 402 472 4054<tel:%2B1%20402%20472%204054> fax: +1 402 472 2946<tel:%2B1%20402%20472%202946> email: aty...@unl.edu<mailto:aty...@unl.edu> http://snr.unl.edu/tyre http://aminpractice.blogspot.com http://www.flickr.com/photos/atiretoo [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.