I ran your code and did not see any growth: used (Mb) gc trigger (Mb) max used (Mb) Ncells 463828 24.8 818163 43.7 818163 43.7 Vcells 546318 4.2 1031040 7.9 909905 7.0 1 (1) - eval : <33.6 376.6> 376.6 : 48.9MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 471049 25.2 818163 43.7 818163 43.7 Vcells 544105 4.2 1031040 7.9 909905 7.0 2 (1) - eval : <35.9 379.2> 379.2 : 48.7MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 479520 25.7 818163 43.7 818163 43.7 Vcells 543882 4.2 1031040 7.9 909905 7.0 3 (1) - eval : <38.0 381.4> 381.4 : 48.7MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 488376 26.1 818163 43.7 818163 43.7 Vcells 544191 4.2 1031040 7.9 909905 7.0 4 (1) - eval : <40.0 383.4> 383.4 : 48.8MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 496695 26.6 818163 43.7 818163 43.7 Vcells 543971 4.2 1031040 7.9 909905 7.0 5 (1) - eval : <42.0 385.4> 385.4 : 48.7MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 505562 27.0 899071 48.1 818163 43.7 Vcells 544034 4.2 1031040 7.9 909905 7.0 6 (1) - eval : <44.1 387.5> 387.5 : 48.8MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 513896 27.5 899071 48.1 899071 48.1 Vcells 543973 4.2 1031040 7.9 909905 7.0 7 (1) - eval : <46.2 389.8> 389.8 : 52.5MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 523203 28.0 899071 48.1 899071 48.1 Vcells 544751 4.2 1031040 7.9 909905 7.0 8 (1) - eval : <48.5 392.2> 392.2 : 46.7MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 531519 28.4 899071 48.1 899071 48.1 Vcells 544418 4.2 1031040 7.9 909905 7.0 9 (1) - eval : <50.6 394.5> 394.5 : 47.3MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 539556 28.9 899071 48.1 899071 48.1 Vcells 544057 4.2 1031040 7.9 909905 7.0 10 (1) - eval : <52.6 396.6> 396.6 : 47.8MB
started out with 48M and ended with 47M. This is with R version 2.15.2 (2012-10-26) -- "Trick or Treat" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-w64-mingw32/x64 (64-bit) On Fri, Dec 21, 2012 at 10:27 AM, Peter Meißner <peter.meiss...@uni-konstanz.de> wrote: > Here is an working example that reproduces the behavior by creating 1000 > xml-files and afterwards parsing them. > > At my PC, R starts with about 90MB of RAM with every cycle another 10-12MB > are further added to the RAM-usage so I end up with 200MB RAM usage. > > In the real code one chunk-cycle eats about 800MB of RAM which was one of > the reasons I decided to splitt up the process in seperate chunks in the > first place. > > ---------------- > 'Minimal'Example - START > ---------------- > > # the general problem > require(XML) > > chunk <- function(x, chunksize){ > # source: http://stackoverflow.com/a/3321659/1144966 > x2 <- seq_along(x) > split(x, ceiling(x2/chunksize)) > } > > > > chunky <- chunk(paste("test",1:1000,".xml",sep=""),100) > > for(i in 1:1000){ > writeLines(c(paste('<?xml version="1.0"?>\n <note>\n <to>Tove</to>\n > <nr>',i,'</nr>\n <from>Jani</from>\n <heading>Reminder</heading>\n > ',sep=""), paste(rep('<body>Do not forget me this > weekend!</body>\n',sample(1:10, 1)),sep="" ) , ' </note>') > ,paste("test",i,".xml",sep="")) > } > > for(k in 1:length(chunky)){ > gc() > print(chunky[[k]]) > xmlCatcher <- NULL > > for(i in 1:length(chunky[[k]])){ > filename <- chunky[[k]][i] > xml <- xmlTreeParse(filename) > xml <- xmlRoot(xml) > result <- sapply(getNodeSet(xml,"//body"), xmlValue) > id <- sapply(getNodeSet(xml,"//nr"), xmlValue) > dummy <- cbind(id,result) > xmlCatcher <- rbind(xmlCatcher,dummy) > } > save(xmlCatcher,file=paste("xmlCatcher",k,".RData")) > } > > ---------------- > 'Minimal'Example - END > ---------------- > > > > Am 21.12.2012 15:14, schrieb jim holtman: > >> Can you send either your actual script or the console output so I can >> get an idea of how fast memory is growing. Also at the end, can you >> list the sizes of the objects in the workspace. Here is a function I >> use to get the space: >> >> my.ls <- >> function (pos = 1, sorted = FALSE, envir = as.environment(pos)) >> { >> .result <- sapply(ls(envir = envir, all.names = TRUE), >> function(..x) object.size(eval(as.symbol(..x), >> envir = envir))) >> if (length(.result) == 0) >> return("No objects to list") >> if (sorted) { >> .result <- rev(sort(.result)) >> } >> .ls <- as.data.frame(rbind(as.matrix(.result), `**Total` = >> sum(.result))) >> names(.ls) <- "Size" >> .ls$Size <- formatC(.ls$Size, big.mark = ",", digits = 0, >> format = "f") >> .ls$Class <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)], >> function(x) class(eval(as.symbol(x), >> envir = envir))[1L])), "-------") >> .ls$Length <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)], >> function(x) length(eval(as.symbol(x), envir = envir)))), >> "-------") >> .ls$Dim <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x) >> paste(dim(eval(as.symbol(x), >> envir = envir)), collapse = " x "))), "-------") >> .ls >> } >> >> >> which gives output like this: >> >>> my.ls() >> >> Size Class Length Dim >> .Last 736 function 1 >> .my.env.jph 28 environment 39 >> x 424 integer 100 >> y 40,024 integer 10000 >> z 4,000,024 integer 1000000 >> **Total 4,041,236 ------- ------- ------- >> >> >> On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner >> <peter.meiss...@uni-konstanz.de> wrote: >>> >>> Thanks for your answer, >>> >>> yes, I tried 'gc()' it did not change the bahavior. >>> >>> best, Peter >>> >>> >>> Am 21.12.2012 13:37, schrieb jim holtman: >>>> >>>> >>>> have you tried putting calls to 'gc' at the top of the first loop to >>>> make sure memory is reclaimed? You can print the call to 'gc' to see >>>> how fast it is growing. >>>> >>>> On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner >>>> <peter.meiss...@uni-konstanz.de> wrote: >>>>> >>>>> >>>>> Hey, >>>>> >>>>> I have an double loop like this: >>>>> >>>>> >>>>> chunk <- list(1:10, 11:20, 21:30) >>>>> for(k in 1:length(chunk)){ >>>>> print(chunk[k]) >>>>> DummyCatcher <- NULL >>>>> for(i in chunk[k]){ >>>>> print("i load something") >>>>> dummy <- 1 >>>>> print("i do something") >>>>> dummy <- dummy + 1 >>>>> print("i do put it together") >>>>> DummyCatcher = rbind(DummyCatcher, dummy) >>>>> } >>>>> print("i save a chunk and restart with another chunk of >>>>> data") >>>>> } >>>>> >>>>> The problem now is that with each 'chunk'-cycle the memory used by R >>>>> becomes >>>>> bigger and bigger until it exceeds my RAM but the RAM it needs for any >>>>> of >>>>> the chunk-cycles alone is only a 1/5th of what I have overall. >>>>> >>>>> Does somebody have an idea why this behaviour might occur? Note that >>>>> all >>>>> the >>>>> objects (like 'DummyCatcher') are reused every cycle so that I would >>>>> assume >>>>> that the RAM used should stay about the same after the first 'chunk' >>>>> cycle. >>>>> >>>>> >>>>> Best, Peter >>>>> >>>>> >>>>> SystemInfo: >>>>> >>>>> R version 2.15.2 (2012-10-26) >>>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>>> Win7 Enterprise, 8 GB RAM >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>>> >>>> >>>> >>> >>> -- >>> Peter Meißner >>> Workgroup 'Comparative Parliamentary Politics' >>> Department of Politics and Administration >>> University of Konstanz >>> Box 216 >>> 78457 Konstanz >>> Germany >>> >>> +49 7531 88 5665 >>> http://www.polver.uni-konstanz.de/sieberer/home/ >> >> >> >> > > -- > Peter Meißner > Workgroup 'Comparative Parliamentary Politics' > Department of Politics and Administration > University of Konstanz > Box 216 > 78457 Konstanz > Germany > > +49 7531 88 5665 > http://www.polver.uni-konstanz.de/sieberer/home/ -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.