New submission from Nick Guenther <n...@kousu.ca>:

multiprocessing.Pool.imap() is supposed to be a lazy version of map. But it's 
not: it submits work to its workers eagerly. As a consequence, in a pipeline, 
all the work from earlier steps is queued, performed, and finished first, 
before starting later steps.

If you use python3's built-in map() -- aka the old itertools.imap() -- the 
operations are on-demand, so it surprised me that Pool.imap() doesn't. It's 
basically no better than using Pool.map(). Maybe it saves memory by not 
materializing large iterables in every worker process? But it still 
materializes the CPU time from the iterables even if unneeded.

This can be partially worked around by giving each step of the pipeline its own 
Pool -- then, at least the earlier steps of the pipeline don't block the later 
steps -- but the jobs are still done eagerly instead of waiting for their 
results to actually be requested.

----------
files: multiprocessing-eager-imap.py
messages: 365295
nosy: kousu
priority: normal
severity: normal
status: open
title: multiprocessing.Pool.imap() should be lazy
versions: Python 2.7, Python 3.5, Python 3.6, Python 3.7, Python 3.8
Added file: https://bugs.python.org/file49010/multiprocessing-eager-imap.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue40110>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to