New submission from Mouse <u...@ll.mit.edu>:

MacOS Catalina 10.15.3 and 10.15.4. Python-3.8.2 (also tested with 3.7.7, which 
confirmed the problem being in the fix described in 
https://bugs.python.org/issue33725.

Trying to use "multiprocessor" with Python-3.8 and with the new default of 
`set_start_method('spawn')` is nothing but a disaster.

Not doing join() leads to consistent crashes, like described here 
https://bugs.python.org/issue33725#msg365249

Adding p.join() immediately after p.start() seems to work, but increases the 
total run-time by factor between two and four, user time by factor of five, and 
system time by factor of ten. 

Occasionally even with p.join() I'm getting some processes crashing like  shown 
in https://bugs.python.org/issue33725#msg365249. 

I found two workarounds:
1. Switch back to 'fork' by explicitly adding `set_start_method('fork') to the 
__main__.
2. Drop the messy "multiprocessing" package and use "multiprocess" instead, 
which turns out to be a good and reliable fork of "multiprocessing".

If anybody cares to dig deeper into this problem, I'd be happy to provide 
whatever information that could be helpful.

Here's the sample code (again):
```
#!/usr/bin/env python3
#
# Test "multiprocessing" package included with Python-3.6+
#
# Usage:
#    ./mylti1.py [nElements [nProcesses [tSleep]]]
#
#        nElements  - total number of integers to put in the queue
#                     default: 100
#        nProcesses - total number of parallel processes/threads
#                     default: number of physical cores available
#        tSleep     - number of milliseconds for a thread to sleep
#                     after it retrieved an element from the queue
#                     default: 17
#
# Algorithm:
#   1. Creates a queue and adds nElements integers to it,
#   2. Creates nProcesses threads
#   3. Each thread extracts an element from the queue and sleeps for tSleep 
milliseconds
#

import sys, queue, time
import multiprocessing as mp


def getElements(q, tSleep, idx):
    l = []  # list of pulled numbers
    while True:
        try:
            l.append(q.get(True, .001))
            time.sleep(tSleep)
        except queue.Empty:
            if q.empty():
                print(f'worker {idx} done, got {len(l)} numbers')
                return


if __name__ == '__main__':
    nElements = int(sys.argv[1]) if len(sys.argv) > 1 else 100
    nProcesses = int(sys.argv[2]) if len(sys.argv) > 2 else mp.cpu_count()
    tSleep = float(sys.argv[3]) if len(sys.argv) > 3 else 17

    # To make this sample code work reliably and fast, uncomment following line
    #mp.set_start_method('fork')

    # Fill the queue with numbers from 0 to nElements
    q = mp.Queue()
    for k in range(nElements):
        q.put(k)

    # Keep track of worker processes
    workers = []

    # Start worker processes
    for m in range(nProcesses):
        p = mp.Process(target=getElements, args=(q, tSleep / 1000, m))
        workers.append(p)
        p.start()

    # Now do the joining
    for p in workers:
        p.join()
```

Here's the timing:
```
$ time python3 multi1.py
worker 9 done, got 5 numbers
worker 16 done, got 5 numbers
worker 6 done, got 5 numbers
worker 8 done, got 5 numbers
worker 17 done, got 5 numbers
worker 3 done, got 5 numbers
worker 14 done, got 5 numbers
worker 0 done, got 5 numbers
worker 15 done, got 4 numbers
worker 7 done, got 5 numbers
worker 5 done, got 5 numbers
worker 12 done, got 5 numbers
worker 4 done, got 5 numbers
worker 19 done, got 5 numbers
worker 18 done, got 5 numbers
worker 1 done, got 5 numbers
worker 10 done, got 5 numbers
worker 2 done, got 5 numbers
worker 11 done, got 6 numbers
worker 13 done, got 5 numbers

real    0m0.325s
user    0m1.375s
sys     0m0.692s
```

If I comment out the join() and uncomment set_start_method('fork'), the timing 
is
```
$ time python3 multi1.py
worker 0 done, got 5 numbers
worker 3 done, got 5 numbers
worker 2 done, got 5 numbers
worker 1 done, got 5 numbers
worker 5 done, got 5 numbers
worker 10 done, got 5 numbers
worker 6 done, got 5 numbers
worker 4 done, got 5 numbers
worker 7 done, got 5 numbers
worker 9 done, got 5 numbers
worker 8 done, got 5 numbers
worker 14 done, got 5 numbers
worker 11 done, got 5 numbers
worker 12 done, got 5 numbers
worker 13 done, got 5 numbers
worker 16 done, got 5 numbers
worker 15 done, got 5 numbers
worker 17 done, got 5 numbers
worker 18 done, got 5 numbers
worker 19 done, got 5 numbers

real    0m0.175s
user    0m0.073s
sys     0m0.070s
```

You can observe the difference.

Here's the timing if I don't bother with either join() or set_start_method(), 
but import "multiprocess" instead:
```
$ time python3 multi2.py 
worker 0 done, got 5 numbers
worker 1 done, got 5 numbers
worker 2 done, got 5 numbers
worker 4 done, got 5 numbers
worker 3 done, got 5 numbers
worker 5 done, got 5 numbers
worker 6 done, got 5 numbers
worker 8 done, got 5 numbers
worker 9 done, got 5 numbers
worker 7 done, got 5 numbers
worker 14 done, got 5 numbers
worker 11 done, got 5 numbers
worker 13 done, got 5 numbers
worker 16 done, got 5 numbers
worker 12 done, got 5 numbers
worker 10 done, got 5 numbers
worker 15 done, got 5 numbers
worker 17 done, got 5 numbers
worker 18 done, got 5 numbers
worker 19 done, got 5 numbers

real    0m0.192s
user    0m0.089s
sys     0m0.076s
```

Also, on a weaker machine with only 4 cores (rather than 20 that ran the above 
example), the instability of the "multiprocessor"-based code shows:
```
$ time python3.8 multi1.py 
worker 3 done, got 33 numbers
worker 2 done, got 33 numbers
worker 1 done, got 34 numbers
worker 0 done, got 0 numbers

real    0m5.448s
user    0m0.339s
sys     0m0.196s
```
Observe how one process out of four got nothing from the queue. With 
"multiprocess" the code runs like a clockwork - each process gets exactly 1/N 
of the queue:
```
$ time python3.8 multi2.py 
worker 0 done, got 25 numbers
worker 1 done, got 25 numbers
worker 2 done, got 25 numbers
worker 3 done, got 25 numbers

real    0m0.551s
user    0m0.082s
sys     0m0.044s
```

I think that the best course for "multiprocessor" would be reverting the 
default to 'fork'. It also looks like for the users the best course would be 
switching to "multiprocess".

----------
components: macOS
messages: 365279
nosy: mouse07410, ned.deily, ronaldoussoren
priority: normal
severity: normal
status: open
title: multiprocessor spawn
type: crash
versions: Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue40106>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to