Hi folks, I'm developing some custom neural network code. I'm using Python 2.6, Numpy 1.5, and Ubuntu Linux 10.10. I have an AMD 1090T six-core CPU, and I want to take full advantage of it. I love to hear my CPU fan running, and watch my results come back faster.
When I'm training a neural network, I pass two numpy.ndarray objects to a function called evaluate. One array contains the weights for the neural network, and the other array contains the input data. The evaluate function returns an array of output data. I have been playing with multiprocessing for a while now, and I have some familiarity with Pool. Apparently, arguments passed to a Pool subprocess must be able to be pickled. Pickling is still a pretty vague progress to me, but I can see that you have to write custom __reduce__ and __setstate__ methods for your objects. An example of code which creates a pickle-friendly ndarray subclass is here: http://www.mail-archive.com/numpy-discussion@scipy.org/msg02446.html Now, I don't know that I actually HAVE to pass my neural network and input data as copies -- they're both READ-ONLY objects for the duration of an evaluate function (which can go on for quite a while). So, I have also started to investigate shared-memory approaches. I don't know how a shared-memory object is referenced by a subprocess yet, but presumably you pass a reference to the object, rather than the whole object. Also, it appears that subprocesses also acquire a temporary lock over a shared memory object, and thus one process may well spend time waiting for another (individual CPU caches may sidestep this problem?) Anyway, an implementation of a shared-memory ndarray is here: https://bitbucket.org/cleemesser/numpy-sharedmem/src/3fa526d11578/shmarray.py I've added a few lines to this code which allows subclassing the shared memory array, which I need (because my neural net objects are more than just the array, they also contain meta-data). But I've run into some trouble doing the actual sharing part. The shmarray class CANNOT be pickled. I think that my understanding of multiprocessing needs to evolve beyond the use of Pool, but I'm not sure yet. This post suggests as much. http://mail.scipy.org/pipermail/scipy-user/2009-February/019696.html I don't believe that my questions are specific to numpy, which is why I'm posting here, in a more general Python forum. When should one pickle and copy? When to implement an object in shared memory? Why is pickling apparently such a non-trivial process anyway? And, given that multi-core CPU's are apparently here to stay, should it be so difficult to make use of them? -- http://mail.python.org/mailman/listinfo/python-list