Hi. I'm using CPython 2.7 and Linux. In order to make parallel computations on a large list of objects I want to use multiple processes (by using multiprocessing module). In the first step I fill the list with objects and then I fork() my worker processes that do the job.
This should work optimally in the aspect of memory usage because Linux implements copy-on-write in forked processes. So I should have only one physical list of objects (the worker processes don't change the objects on the list). The problem is that after a short time children processes are using more and more memory (they don't create new objects - they only read objects from the list and write computation result to the database). After investigation I concluded the source of this must be incrementing of a reference counter when getting an object from the list. It changes only one int but OS must copy the whole memory page to the child process. I reimplemented the function for getting the element (from the file listobject.c) but omitting the PY_INCREF call and it solved my problems with increasing memory. The questions is: are there any better ways to have a real read-only list (in terms of memory representation of objects)? My solution is of course not safe. I thought about weakrefs but it seems they cannot be used here because getting a real reference from a weakref increases a reference counter. Maybe another option would be to store reference counters not in objects, but in a separate array to minimize number of memory pages they occupy... -- http://mail.python.org/mailman/listinfo/python-list