Bugs item #2814892, was opened at 2009-06-30 18:42 Message generated for change (Tracker Item Submitted) made by batripler You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=453021&aid=2814892&group_id=48422
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Source Group: rpy2 Status: Open Resolution: None Priority: 5 Private: Yes Submitted By: batripler (batripler) Assigned to: Nobody/Anonymous (nobody) Summary: memory leak in SexpVector Initial Comment: Hi Laurent, Thank you for creating a wonderfully useful piece of software. I've started using it for a few weeks, now, and I think I have uncovered a relatively serious problem in rpy2.rinterface.SexpVector, which is at the heart of the system. Here is a manifestation of the problem. Perhaps I am doing something wrong. Start a new python session and run: {{{ import numpy; x=numpy.zeros(2e7) }}} You can modify the size of the array. Also, depending on the numpy defaults on your machine, the memory consumption will vary. On my machine a double is 8 bytes, times 2e7 = ~150MB. I see the process at ~162MB due to the Python interpreter footprint. Now, kill this session, start a new one, and run the following: {{{ import rpy2.robjects, rpy2.rinterface as rint reval=rint.baseNameSpaceEnv['eval'] rparse=rint.baseNameSpaceEnv['parse'] x=reval(rparse(text=rint.StrSexpVector(["numeric(2e7)"]))) }}} In this case, we are just creating a REALSXP vector on the R side. I see the process coming in at 203MB, which is reasonable given that both Python and R interpretters are now running. Again, this is assuming that every element of the REALSXP vector is 8 bytes. Now, finally, in a new Python process, let's create an array on the Python side and copy it over to the R side: {{{ import numpy, rpy2.robjects, rpy2.rinterface as rint; x=numpy.zeros(2e7); y=rint.SexpVector(x, rint.REALSXP) }}} I would expect the max size of this process to be the sum of the previous two. It contains both the Python object, as well an equivalently-large R object. It comes in at a whopping 950MB!! Incidentally, if I run the following code: {{{ import numpy; x=numpy.zeros(2e7); y=list(x) }}} ... it weighs in at a hefty 985MB. Now, I had a look at your code rinterface.c:newSEXP, and noticed that you are iterating using the Python sequence protocol, and creating object intermediaries. I don't mind the temporary memory bloat -- though it would be much faster and leaner to special-case numpy arrays and avoid the move to object space and back, -- but somehow these intermediaries are also hanging around. Either that, or the allocator is, for some reason, not returning space back to the OS. A few of these conversions and our processes is toasted. Also, for proper 64-bit compatiblity, the index variable "i" should be Py_ssize_t. FYI - I'm compiling from source on a 64-bit Linux box. Running Python 2.5.4, with numpy 1.3.0, and rpy2 2.0.5. Thanks again for a great tool. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=453021&aid=2814892&group_id=48422 ------------------------------------------------------------------------------ _______________________________________________ rpy-list mailing list rpy-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rpy-list