[Rd] Defragmentation of memory

Måns Magnusson Sun, 04 Sep 2016 17:15:01 -0700

Dear all developers,

I'm working with a lot of textual data in R and need to handle this batch
by batch. The problem is that I read in batches of 10 000 documents and do
some calculations that results in objects that consume quite some memory
(calculate unigrams, 2-grams and 3-grams). In every iteration a new objects
(~ 500 mB) is created (and I can't control the size, so a new object needs
to be created each iteration). The speed of this computations is decreasing
every iteration (first iteration 7 sec, after 30 iterations 20-30 minutes
per iteration).


I (think) I localized the problem to R:s memory handling and that my
approach is fragmenting the memory. If I do this batch handling in Bash and
starting up a new R session for each batch it takes ~ 7 sec per batch, so
it is nothing with the individual batches. The garbage collector do not
seem to handle this (potential) fragmentation.

Can the reason of the poor performance after a couple of iterations be that
I'm fragmenting the memory? If so, is there a solution that can used to
handle this within R, such as defragmentation or restarting R from within R?

With kind regards
Måns Magnusson

PhD Student, Statistics, Linköping University.

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Defragmentation of memory

Reply via email to