Josh Rosenberg <shadowranger+pyt...@gmail.com> added the comment:

In response to Max's comments:

>But consider the case where input is produced slower than it can be processed 
>(`iterables` may fetch data from a database, but the callable `fn` may be a 
>fast in-memory transformation). Now suppose the `Executor.map` is called when 
>the pool is busy, so there'll be a delay before processing begins. In this 
>case, the most efficient approach is to get as much input as possible while 
>the pool is busy, since eventually (when the pool is freed up) it will become 
>the bottleneck. This is exactly what the current implementation does.

I'm not sure the "slow input iterable, fast task, competing tasks from other 
sources" case is all that interesting. Uses of Executor.map in the first place 
are usually a replacement for complex task submission; perhaps my viewpoint is 
blinkered, but I see the Executors used for *either* explicit use of submit 
*or* map, rather than mixing and matching (you might use it for both, but 
rarely interleave usages). Without a mix and match scenario (and importantly, a 
mix and match scenario where enough work is submitted before the map to occupy 
all workers, and very little work is submitted after the map begins to space 
out map tasks such that additional map input is requested while workers are 
idle), the smallish default prefetch is an improvement, simply by virtue of 
getting initial results more quickly.

The solution of making a dedicated input thread would introduce quite a lot of 
additional complexity, well beyond what I think it justifiable for a relatively 
niche use case, especially one with many available workarounds, e.g.

1. Raising the prefetch count explicitly

2. Having the caller listify the iterable (similar to passing an arbitrarily 
huge prefetch value, with the large prefetch value having the advantage of 
sending work to the workers immediately, while listifying has the advantage of 
allowing you to handle any input exceptions up front rather than receiving them 
lazily during processing)

3. Use cheaper inputs (e.g. the query string, not the results of the DB query) 
and perform the expensive work as part of the task (after all, the whole point 
is to parallelize the most expensive work)

4. Using separate Executors so the manually submitted work doesn't interfere 
with the mapped work, and vice versa

5. Making a separate ThreadPoolExecutor to generate the expensive input values 
via its own map function (optionally with a larger prefetch count), e.g. 
instead of

with SomeExecutor() as executor:
    for result in executor.map(func, (get_from_db(query) for query in queries)):

do:

with SomeExecutor() as executor, ThreadPoolExecutor() as inputexec:
    inputs = inputexec.map(get_from_db, queries)
    for result in executor.map(func, inputs):

Point is, yes, there will still be niche cases where Executor.map isn't 
perfect, but this patch is intentionally a bit more minimal to keep the Python 
code base simple (no marshaling exceptions across thread boundaries) and avoid 
extreme behavioral changes; it has some smaller changes, e.g. it necessarily 
means input-iterator-triggered exceptions can be raised after some results are 
successfully produced, but it doesn't involve adding more implicit threading, 
marshaling exceptions across threads, etc.

Your proposed alternative, with a thread for prefetching inputs, a thread for 
sending tasks, and a thread for returning results creates a number of problems:

1. As you mentioned, if no prefetch limit is imposed, memory usage remains 
unbounded; if the input is cheap to generate and slow to process, memory 
exhaustion is nearly guaranteed for infinite inputs, and more likely for "very 
large" inputs. I'd prefer the default arguments to be stable in (almost) all 
cases, rather than try to maximize performance for rare cases at the expense of 
stability in many cases.

2. When input generation is CPU bound, you've just introduced an additional 
source of unavoidable GIL contention; granted, after the GIL fixes in 3.2, GIL 
contention tends to hurt less (before those fixes, I could easily occupy 1.9 
cores doing 0.5 cores worth of actual work with just two CPU bound threads). 
Particularly in the ProcessPoolExecutor case (where avoiding GIL contention is 
the goal), it's a little weird if you can end up with unavoidable GIL 
contention in the main process.

3. Exception handling from the input iterator just became a nightmare; in a 
"single thread performs input pulls and result yield" scenario, the exceptions 
from the input thread naturally bubble to the caller of Executor.map (possibly 
after several results have been produced, but eventually). If a separate thread 
is caching from the input iterator, we'd need to marshal the exception from 
that thread back to the thread running Executor.map so it's visible to the 
caller, and providing a traceback that is both accurate and useful is not 
obvious.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue29842>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to