Kyle Stanley <aeros...@gmail.com> added the comment:

> What "ignores the max_workers argument" means?

>From my understanding, their argument was that the parameter name 
>"max_workers" and documentation implies that it will spawn processes as needed 
>up to *max_workers* based on the number of jobs scheduled. 

> And would you create a simple reproducible example?

I can't speak directly for the OP, but this simple example may demonstrate what 
they're talking about:

Linux 5.4.8
Python 3.8.1

```
import concurrent.futures as cf
import os
import random

def get_rand_nums(ls, n):
    return [random.randint(1, 100) for i in range(n)]
    
def show_processes():
    print("All python processes:")
    os.system("ps -C python")

def main():
    nums = []
    with cf.ProcessPoolExecutor(max_workers=6) as executor:
        futs = []
        show_processes()
        for _ in range(3):
            fut = executor.submit(get_rand_nums, nums, 10_000_000)
            futs.append(fut)
        show_processes()
        for fut in cf.as_completed(futs):
            nums.extend(fut.result())
        show_processes()

    assert len(nums) == 30_000_000

if __name__ == '__main__':
    main()
```

Output:

```
[aeros:~/programming/python]$ python ppe_max_workers.py
All python processes: # Main python process
    PID TTY          TIME CMD
  23683 pts/1    00:00:00 python
All python processes: # Main python process + 6 unused subprocesses
    PID TTY          TIME CMD
  23683 pts/1    00:00:00 python
  23685 pts/1    00:00:00 python
  23686 pts/1    00:00:00 python
  23687 pts/1    00:00:00 python
  23688 pts/1    00:00:00 python
  23689 pts/1    00:00:00 python
  23690 pts/1    00:00:00 python
All python processes: # Main python process + 3 used subprocesses + 3 unused 
subprocesses
    PID TTY          TIME CMD
  23683 pts/1    00:00:00 python
  23685 pts/1    00:00:07 python
  23686 pts/1    00:00:07 python
  23687 pts/1    00:00:07 python
  23688 pts/1    00:00:00 python
  23689 pts/1    00:00:00 python
  23690 pts/1    00:00:00 python
```

As seen above, all processes up to *max_workers* were spawned immediately after 
the jobs were submitted to ProcessPoolExecutor, regardless of the actual number 
of jobs (3). It is also apparent that only three of those spawned processes 
were utilized by the CPU, as indicated by the values in the TIME field. The 
other three processes were not used.

If it wasn't for this behavior, I think there would be a significant 
performance loss, as the executor would have to continuously calculate how many 
processes are needed and spawn them throughout it's lifespan. AFAIK, it _seems_ 
more efficient to spawn *max_workers* processes when the jobs are scheduled, 
and then use them as needed; rather than spawning the processes as needed.

As a result, I think the current behavior should remain the same; unless 
someone can come up with a backwards-compatible alternative version and 
demonstrate its advantages over the current one.

However, I do think the current documentation could do a better at explaining 
how max_workers actually behaves. See the current explanation: 
https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor.

The current version does not address any of the above points. In fact, the 
first line seems like it might imply the opposite of what it's actually doing 
(at least based on my above example):

"An Executor subclass that executes calls asynchronously *using a pool of at 
most max_workers processes*." (asterisks added for emphasis)

"using a pool of at most max_workers processes" could imply to users that 
*max_workers* sets an upper bound limit on the number of processes in the pool, 
but that *max_workers* is only reached if all of those processes are _needed_. 
Unless I'm misunderstanding something, that's not the case.

I would suggest converting this into a documentation issue, assuming that the 
experts for the concurrent.futures confirm that the present behavior is 
intentional and that I'm correctly understanding the OP.

----------
nosy: +aeros

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue39207>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to