Hi Lie, Thank you for the reply. El Lunes Diciembre 5 2011, Lie Ryan escribió: > On 11/30/2011 06:09 AM, DPalao wrote: > > Hello, > > I'm trying to use multiprocessing to parallelize a code. There is a > > number of tasks (usually 12) that can be run independently. Each task > > produces a numpy array, and at the end, those arrays must be combined. > > I implemented this using Queues (multiprocessing.Queue): one for input > > and another for output. > > But the code blocks. And it must be related to the size of the item I put > > on the Queue: if I put a small array, the code works well; if the array > > is realistically large (in my case if can vary from 160kB to 1MB), the > > code blocks apparently forever. > > I have tried this: > > http://www.bryceboe.com/2011/01/28/the-python-multiprocessing-queue-and-l > > arge- objects/ > > but it didn't work (especifically I put a None sentinel at the end for > > each worker). > > > > Before I change the implementation, > > is there a way to bypass this problem with multiprocessing.Queue? > > Should I post the code (or a sketchy version of it)? > > Transferring data over multiprocessing.Queue involves copying the whole > object across an inter-process pipe, so you need to have a reasonably > large workload in the processes to justify the cost of the copying to > benefit from running the workload in parallel. > > You may try to avoid the cost of copying by using shared memory > (http://docs.python.org/library/multiprocessing.html#sharing-state-between- > processes); you can use Queue for communicating when a new data comes in or > when a task is done, but put the large data in shared memory. Be careful > not to access the data from multiple processes concurrently. >
Yep, that was my first thought, but the arrays's elements are complex64 (or complex in general), and I don't know how to easily convert from multiprocessing.Array to/from numpy.array when the type is complex. Doing that would require some extra conversions forth and back which make the solution not very attractive to me. I tried with a Manager too, but the array cannot be modified from within the worker processes. In principle, the array I need to share is expected to be, at most, ~2MB in size, and typically should be only <200kB. So, in principle, there is no huge extra workload. But that could change, and I'd like to be prepared for it, so any idea about using an Array or a Manager or another shared memory thing would be great. > In any case, have you tried a multithreaded solution? numpy is a C > extension, and I believe it releases the GIL when working, so it > wouldn't be in your way to achieve parallelism. That possibility I didn't know. What does exactly break the GIL? The sharing of a numpy array? What if I need to also share some other "standard" python data (eg, a dictionary)? Best regards, David -- http://mail.python.org/mailman/listinfo/python-list