On Nov 14, 11:08 am, Artur Siekielski <artur.siekiel...@gmail.com> wrote: > Hi. > I'm using CPython 2.7 and Linux. In order to make parallel > computations on a large list of objects I want to use multiple > processes (by using multiprocessing module). In the first step I fill > the list with objects and then I fork() my worker processes that do > the job. > > This should work optimally in the aspect of memory usage because Linux > implements copy-on-write in forked processes. So I should have only > one physical list of objects (the worker processes don't change the > objects on the list). The problem is that after a short time children > processes are using more and more memory (they don't create new > objects - they only read objects from the list and write computation > result to the database). > > After investigation I concluded the source of this must be > incrementing of a reference counter when getting an object from the > list. It changes only one int but OS must copy the whole memory page > to the child process. I reimplemented the function for getting the > element (from the file listobject.c) but omitting the PY_INCREF call > and it solved my problems with increasing memory. > > The questions is: are there any better ways to have a real read-only > list (in terms of memory representation of objects)? My solution is of > course not safe. I thought about weakrefs but it seems they cannot be > used here because getting a real reference from a weakref increases a > reference counter. Maybe another option would be to store reference > counters not in objects, but in a separate array to minimize number of > memory pages they occupy...
It might be interesting to try with Jython or PyPy. Neither of these Python runtimes uses reference counting at all. Jean-Paul -- http://mail.python.org/mailman/listinfo/python-list