Re: [R] Memory filling up while looping

jim holtman Fri, 21 Dec 2012 12:27:27 -0800

I ran your code and did not see any growth:

         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 463828 24.8     818163 43.7   818163 43.7
Vcells 546318  4.2    1031040  7.9   909905  7.0
1 (1) - eval : <33.6 376.6> 376.6 : 48.9MB
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 471049 25.2     818163 43.7   818163 43.7
Vcells 544105  4.2    1031040  7.9   909905  7.0
2 (1) - eval : <35.9 379.2> 379.2 : 48.7MB
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 479520 25.7     818163 43.7   818163 43.7
Vcells 543882  4.2    1031040  7.9   909905  7.0
3 (1) - eval : <38.0 381.4> 381.4 : 48.7MB
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 488376 26.1     818163 43.7   818163 43.7
Vcells 544191  4.2    1031040  7.9   909905  7.0
4 (1) - eval : <40.0 383.4> 383.4 : 48.8MB
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 496695 26.6     818163 43.7   818163 43.7
Vcells 543971  4.2    1031040  7.9   909905  7.0
5 (1) - eval : <42.0 385.4> 385.4 : 48.7MB
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 505562 27.0     899071 48.1   818163 43.7
Vcells 544034  4.2    1031040  7.9   909905  7.0
6 (1) - eval : <44.1 387.5> 387.5 : 48.8MB
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 513896 27.5     899071 48.1   899071 48.1
Vcells 543973  4.2    1031040  7.9   909905  7.0
7 (1) - eval : <46.2 389.8> 389.8 : 52.5MB
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 523203 28.0     899071 48.1   899071 48.1
Vcells 544751  4.2    1031040  7.9   909905  7.0
8 (1) - eval : <48.5 392.2> 392.2 : 46.7MB
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 531519 28.4     899071 48.1   899071 48.1
Vcells 544418  4.2    1031040  7.9   909905  7.0
9 (1) - eval : <50.6 394.5> 394.5 : 47.3MB
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 539556 28.9     899071 48.1   899071 48.1
Vcells 544057  4.2    1031040  7.9   909905  7.0
10 (1) - eval : <52.6 396.6> 396.6 : 47.8MB


started out with 48M and ended with 47M.  This is with

R version 2.15.2 (2012-10-26) -- "Trick or Treat"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-w64-mingw32/x64 (64-bit)


On Fri, Dec 21, 2012 at 10:27 AM, Peter Meißner
<peter.meiss...@uni-konstanz.de> wrote:
> Here is an working example that reproduces the behavior by creating 1000
> xml-files and afterwards parsing them.
>
> At my PC, R starts with about 90MB of RAM with every cycle another 10-12MB
> are further added to the RAM-usage so I end up with 200MB RAM usage.
>
> In the real code one chunk-cycle eats about 800MB of RAM which was one of
> the reasons I decided to splitt up the process in seperate chunks in the
> first place.
>
> ----------------
> 'Minimal'Example - START
> ----------------
>
> # the general problem
> require(XML)
>
> chunk <- function(x, chunksize){
>             # source: http://stackoverflow.com/a/3321659/1144966
>             x2 <- seq_along(x)
>             split(x, ceiling(x2/chunksize))
>         }
>
>
>
> chunky <- chunk(paste("test",1:1000,".xml",sep=""),100)
>
> for(i in 1:1000){
>     writeLines(c(paste('<?xml version="1.0"?>\n <note>\n <to>Tove</to>\n
> <nr>',i,'</nr>\n    <from>Jani</from>\n <heading>Reminder</heading>\n
> ',sep=""), paste(rep('<body>Do not forget me this
> weekend!</body>\n',sample(1:10, 1)),sep="" ) , ' </note>')
>     ,paste("test",i,".xml",sep=""))
> }
>
> for(k in 1:length(chunky)){
>     gc()
>     print(chunky[[k]])
>     xmlCatcher <- NULL
>
>     for(i in 1:length(chunky[[k]])){
>         filename    <- chunky[[k]][i]
>         xml         <- xmlTreeParse(filename)
>         xml         <- xmlRoot(xml)
>         result      <- sapply(getNodeSet(xml,"//body"), xmlValue)
>         id          <- sapply(getNodeSet(xml,"//nr"), xmlValue)
>         dummy       <- cbind(id,result)
>         xmlCatcher  <- rbind(xmlCatcher,dummy)
>         }
>     save(xmlCatcher,file=paste("xmlCatcher",k,".RData"))
> }
>
> ----------------
> 'Minimal'Example - END
> ----------------
>
>
>
> Am 21.12.2012 15:14, schrieb jim holtman:
>
>> Can you send either your actual script or the console output so I can
>> get an idea of how fast memory is growing.  Also at the end, can you
>> list the sizes of the objects in the workspace.  Here is a function I
>> use to get the space:
>>
>> my.ls <-
>> function (pos = 1, sorted = FALSE, envir = as.environment(pos))
>> {
>>      .result <- sapply(ls(envir = envir, all.names = TRUE),
>> function(..x) object.size(eval(as.symbol(..x),
>>          envir = envir)))
>>      if (length(.result) == 0)
>>          return("No objects to list")
>>      if (sorted) {
>>          .result <- rev(sort(.result))
>>      }
>>      .ls <- as.data.frame(rbind(as.matrix(.result), `**Total` =
>> sum(.result)))
>>      names(.ls) <- "Size"
>>      .ls$Size <- formatC(.ls$Size, big.mark = ",", digits = 0,
>>          format = "f")
>>      .ls$Class <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
>> function(x) class(eval(as.symbol(x),
>>          envir = envir))[1L])), "-------")
>>      .ls$Length <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
>>          function(x) length(eval(as.symbol(x), envir = envir)))),
>>          "-------")
>>      .ls$Dim <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x)
>> paste(dim(eval(as.symbol(x),
>>          envir = envir)), collapse = " x "))), "-------")
>>      .ls
>> }
>>
>>
>> which gives output like this:
>>
>>> my.ls()
>>
>>                   Size       Class  Length     Dim
>> .Last             736    function       1
>> .my.env.jph        28 environment      39
>> x                 424     integer     100
>> y              40,024     integer   10000
>> z           4,000,024     integer 1000000
>> **Total     4,041,236     ------- ------- -------
>>
>>
>> On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner
>> <peter.meiss...@uni-konstanz.de> wrote:
>>>
>>> Thanks for your answer,
>>>
>>> yes, I tried 'gc()' it did not change the bahavior.
>>>
>>> best, Peter
>>>
>>>
>>> Am 21.12.2012 13:37, schrieb jim holtman:
>>>>
>>>>
>>>> have you tried putting calls to 'gc' at the top of the first loop to
>>>> make sure memory is reclaimed? You can print the call to 'gc' to see
>>>> how fast it is growing.
>>>>
>>>> On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner
>>>> <peter.meiss...@uni-konstanz.de> wrote:
>>>>>
>>>>>
>>>>> Hey,
>>>>>
>>>>> I have an double loop like this:
>>>>>
>>>>>
>>>>> chunk <- list(1:10, 11:20, 21:30)
>>>>> for(k in 1:length(chunk)){
>>>>>           print(chunk[k])
>>>>>           DummyCatcher <- NULL
>>>>>           for(i in chunk[k]){
>>>>>                   print("i load something")
>>>>>                   dummy <- 1
>>>>>                   print("i do something")
>>>>>                   dummy <- dummy + 1
>>>>>                   print("i do put it together")
>>>>>                   DummyCatcher = rbind(DummyCatcher, dummy)
>>>>>           }
>>>>>           print("i save a chunk and restart with another chunk of
>>>>> data")
>>>>> }
>>>>>
>>>>> The problem now is that with each 'chunk'-cycle the memory used by R
>>>>> becomes
>>>>> bigger and bigger until it exceeds my RAM but the RAM it needs for any
>>>>> of
>>>>> the chunk-cycles alone is only a 1/5th of what I have overall.
>>>>>
>>>>> Does somebody have an idea why this behaviour might occur? Note that
>>>>> all
>>>>> the
>>>>> objects (like 'DummyCatcher') are reused every cycle so that I would
>>>>> assume
>>>>> that the RAM used should stay about the same after the first 'chunk'
>>>>> cycle.
>>>>>
>>>>>
>>>>> Best, Peter
>>>>>
>>>>>
>>>>> SystemInfo:
>>>>>
>>>>> R version 2.15.2 (2012-10-26)
>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>> Win7 Enterprise, 8 GB RAM
>>>>>
>>>>> ______________________________________________
>>>>> R-help@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> Peter Meißner
>>> Workgroup 'Comparative Parliamentary Politics'
>>> Department of Politics and Administration
>>> University of Konstanz
>>> Box 216
>>> 78457 Konstanz
>>> Germany
>>>
>>> +49 7531 88 5665
>>> http://www.polver.uni-konstanz.de/sieberer/home/
>>
>>
>>
>>
>
> --
> Peter Meißner
> Workgroup 'Comparative Parliamentary Politics'
> Department of Politics and Administration
> University of Konstanz
> Box 216
> 78457 Konstanz
> Germany
>
> +49 7531 88 5665
> http://www.polver.uni-konstanz.de/sieberer/home/



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory filling up while looping

Reply via email to