On 11/30/2011 06:09 AM, DPalao wrote:
Hello,
I'm trying to use multiprocessing to parallelize a code. There is a number of
tasks (usually 12) that can be run independently. Each task produces a numpy
array, and at the end, those arrays must be combined.
I implemented this using Queues (multiprocessing.Queue): one for input and
another for output.
But the code blocks. And it must be related to the size of the item I put on
the Queue: if I put a small array, the code works well; if the array is
realistically large (in my case if can vary from 160kB to 1MB), the code
blocks apparently forever.
I have tried this:
http://www.bryceboe.com/2011/01/28/the-python-multiprocessing-queue-and-large-
objects/
but it didn't work (especifically I put a None sentinel at the end for each
worker).

Before I change the implementation,
is there a way to bypass this problem with  multiprocessing.Queue?
Should I post the code (or a sketchy version of it)?

Transferring data over multiprocessing.Queue involves copying the whole object across an inter-process pipe, so you need to have a reasonably large workload in the processes to justify the cost of the copying to benefit from running the workload in parallel.

You may try to avoid the cost of copying by using shared memory (http://docs.python.org/library/multiprocessing.html#sharing-state-between-processes); you can use Queue for communicating when a new data comes in or when a task is done, but put the large data in shared memory. Be careful not to access the data from multiple processes concurrently.

In any case, have you tried a multithreaded solution? numpy is a C extension, and I believe it releases the GIL when working, so it wouldn't be in your way to achieve parallelism.

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to