Bugs item #2814892, was opened at 2009-06-30 18:42
Message generated for change (Tracker Item Submitted) made by batripler
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=453021&aid=2814892&group_id=48422

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Source
Group: rpy2
Status: Open
Resolution: None
Priority: 5
Private: Yes
Submitted By: batripler (batripler)
Assigned to: Nobody/Anonymous (nobody)
Summary: memory leak in SexpVector

Initial Comment:
Hi Laurent,

Thank you for creating a wonderfully useful piece of software. I've started 
using it for a few weeks, now, and I think I have uncovered a relatively 
serious problem in rpy2.rinterface.SexpVector, which is at the heart of the 
system. Here is a manifestation of the problem. Perhaps I am doing something 
wrong.

Start a new python session and run:
{{{
import numpy; x=numpy.zeros(2e7)
}}}

You can modify the size of the array. Also, depending on the numpy defaults on 
your machine, the memory consumption will vary. On my machine a double is 8 
bytes, times 2e7 = ~150MB.  I see the process at ~162MB due to the Python 
interpreter footprint.

Now, kill this session, start a new one, and run the following:
{{{
import rpy2.robjects, rpy2.rinterface as rint
reval=rint.baseNameSpaceEnv['eval']
rparse=rint.baseNameSpaceEnv['parse']
x=reval(rparse(text=rint.StrSexpVector(["numeric(2e7)"])))
}}}

In this case, we are just creating a REALSXP vector on the R side. I see the 
process coming in at 203MB, which is reasonable given that both Python and R 
interpretters are now running. Again, this is assuming that every element of 
the REALSXP vector is 8 bytes.

Now, finally, in a new Python process, let's create an array on the Python side 
and copy it over to the R side:
{{{
import numpy, rpy2.robjects, rpy2.rinterface as rint;
x=numpy.zeros(2e7); y=rint.SexpVector(x, rint.REALSXP)
}}}

I would expect the max size of this process to be the sum of the previous two. 
It contains both the Python object, as well an equivalently-large R object. It 
comes in at a whopping 950MB!!

Incidentally, if I run the following code:
{{{
import numpy; x=numpy.zeros(2e7); y=list(x)
}}}
... it weighs in at a hefty 985MB. 

Now, I had a look at your code rinterface.c:newSEXP, and noticed that you are 
iterating using the Python sequence protocol, and creating object 
intermediaries. I don't mind the temporary memory bloat -- though it would be 
much faster and leaner to special-case numpy arrays and avoid the move to 
object space and back, -- but somehow these intermediaries are also hanging 
around. Either that, or the allocator is, for some reason, not returning space 
back to the OS. A few of these conversions and our processes is toasted.

Also, for proper 64-bit compatiblity, the index variable "i" should be 
Py_ssize_t.

FYI - I'm compiling from source on a 64-bit Linux box. Running Python 2.5.4, 
with numpy 1.3.0, and rpy2 2.0.5.

Thanks again for a great tool.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=453021&aid=2814892&group_id=48422

------------------------------------------------------------------------------
_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

Reply via email to