[issue33725] Python crashes on macOS after fork with no exec
Mouse added the comment: The fix applied for this problem actually broke multiprocessing on MacOS. The change to the new default 'spawn' from 'fork' causes program to crash in spawn.py with `FileNotFoundError: [Errno 2] No such file or directory`. I've tested this on MacOS Catalina 10.15.3 and 10.15.4, with Python-3.8.2 and Python-3.7.7. With Python-3.7.7 everything works as expected. Here's the output: {{{ $ python3.8 multi1.py Traceback (most recent call last): File "", line 1, in File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/synchronize.py", line 110, in __setstate__ self._semlock = _multiprocessing.SemLock._rebuild(*state) FileNotFoundError: [Errno 2] No such file or directory }}} Here's the program: {{{ #!/usr/bin/env python3 # # Test "multiprocessing" package included with Python-3.6+ # # Usage: #./mylti1.py [nElements [nProcesses [tSleep]]] # #nElements - total number of integers to put in the queue # default: 100 #nProcesses - total number of parallel processes/threads # default: number of physical cores available #tSleep - number of milliseconds for a thread to sleep # after it retrieved an element from the queue # default: 17 # # Algorithm: # 1. Creates a queue and adds nElements integers to it, # 2. Creates nProcesses threads # 3. Each thread extracts an element from the queue and sleeps for tSleep milliseconds # import sys, queue, time import multiprocessing as mp def getElements(q, tSleep, idx): l = [] # list of pulled numbers while True: try: l.append(q.get(True, .001)) time.sleep(tSleep) except queue.Empty: if q.empty(): print(f'worker {idx} done, got {len(l)} numbers') return if __name__ == '__main__': nElements = int(sys.argv[1]) if len(sys.argv) > 1 else 100 nProcesses = int(sys.argv[2]) if len(sys.argv) > 2 else mp.cpu_count() tSleep = float(sys.argv[3]) if len(sys.argv) > 3 else 17 # Uncomment the following line to make it working with Python-3.8+ #mp.set_start_method('fork') # Fill the queue with numbers from 0 to nElements q = mp.Queue() for k in range(nElements): q.put(k) # Start worker processes for m in range(nProcesses): p = mp.Process(target=getElements, args=(q, tSleep / 1000, m)) p.start() }}} -- nosy: +mouse07410 type: -> crash ___ Python tracker <https://bugs.python.org/issue33725> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33725] Python crashes on macOS after fork with no exec
Mouse added the comment: Tried 'spawn', 'fork', 'forkserver'. - 'spawn' causes consistent `FileNotFoundError: [Errno 2] No such file or directory`; - 'fork' consistently works (tested on machines with 4 and 20 cores); - 'forkserver' causes roughly half of the processes to crash with `FileNotFoundError`, the other half succeeds (weird!). -- ___ Python tracker <https://bugs.python.org/issue33725> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33725] Python crashes on macOS after fork with no exec
Mouse added the comment: @mark.dickinson, the issue you referred to did not show a working sample. Could you demonstrate on my example how it should be applied? Thanks! -- ___ Python tracker <https://bugs.python.org/issue33725> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33725] Python crashes on macOS after fork with no exec
Mouse added the comment: Also, adding `p.join()` immediately after `p.start()` in my sample code showed this timing: ``` $ time python3.8 multi1.py worker 0 done, got 100 numbers worker 1 done, got 0 numbers worker 2 done, got 0 numbers worker 3 done, got 0 numbers real0m2.342s user0m0.227s sys 0m0.111s $ ``` Setting instead start to `fork` showed this timing: ``` $ time python3.8 multi1.py worker 2 done, got 25 numbers worker 0 done, got 25 numbers worker 1 done, got 25 numbers worker 3 done, got 25 numbers real0m0.537s user0m0.064s sys 0m0.040s $ ``` The proposed fix is roughly four times slower, compared to reverting start to `fork`. -- ___ Python tracker <https://bugs.python.org/issue33725> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40106] multiprocessor spawn
New submission from Mouse : MacOS Catalina 10.15.3 and 10.15.4. Python-3.8.2 (also tested with 3.7.7, which confirmed the problem being in the fix described in https://bugs.python.org/issue33725. Trying to use "multiprocessor" with Python-3.8 and with the new default of `set_start_method('spawn')` is nothing but a disaster. Not doing join() leads to consistent crashes, like described here https://bugs.python.org/issue33725#msg365249 Adding p.join() immediately after p.start() seems to work, but increases the total run-time by factor between two and four, user time by factor of five, and system time by factor of ten. Occasionally even with p.join() I'm getting some processes crashing like shown in https://bugs.python.org/issue33725#msg365249. I found two workarounds: 1. Switch back to 'fork' by explicitly adding `set_start_method('fork') to the __main__. 2. Drop the messy "multiprocessing" package and use "multiprocess" instead, which turns out to be a good and reliable fork of "multiprocessing". If anybody cares to dig deeper into this problem, I'd be happy to provide whatever information that could be helpful. Here's the sample code (again): ``` #!/usr/bin/env python3 # # Test "multiprocessing" package included with Python-3.6+ # # Usage: #./mylti1.py [nElements [nProcesses [tSleep]]] # #nElements - total number of integers to put in the queue # default: 100 #nProcesses - total number of parallel processes/threads # default: number of physical cores available #tSleep - number of milliseconds for a thread to sleep # after it retrieved an element from the queue # default: 17 # # Algorithm: # 1. Creates a queue and adds nElements integers to it, # 2. Creates nProcesses threads # 3. Each thread extracts an element from the queue and sleeps for tSleep milliseconds # import sys, queue, time import multiprocessing as mp def getElements(q, tSleep, idx): l = [] # list of pulled numbers while True: try: l.append(q.get(True, .001)) time.sleep(tSleep) except queue.Empty: if q.empty(): print(f'worker {idx} done, got {len(l)} numbers') return if __name__ == '__main__': nElements = int(sys.argv[1]) if len(sys.argv) > 1 else 100 nProcesses = int(sys.argv[2]) if len(sys.argv) > 2 else mp.cpu_count() tSleep = float(sys.argv[3]) if len(sys.argv) > 3 else 17 # To make this sample code work reliably and fast, uncomment following line #mp.set_start_method('fork') # Fill the queue with numbers from 0 to nElements q = mp.Queue() for k in range(nElements): q.put(k) # Keep track of worker processes workers = [] # Start worker processes for m in range(nProcesses): p = mp.Process(target=getElements, args=(q, tSleep / 1000, m)) workers.append(p) p.start() # Now do the joining for p in workers: p.join() ``` Here's the timing: ``` $ time python3 multi1.py worker 9 done, got 5 numbers worker 16 done, got 5 numbers worker 6 done, got 5 numbers worker 8 done, got 5 numbers worker 17 done, got 5 numbers worker 3 done, got 5 numbers worker 14 done, got 5 numbers worker 0 done, got 5 numbers worker 15 done, got 4 numbers worker 7 done, got 5 numbers worker 5 done, got 5 numbers worker 12 done, got 5 numbers worker 4 done, got 5 numbers worker 19 done, got 5 numbers worker 18 done, got 5 numbers worker 1 done, got 5 numbers worker 10 done, got 5 numbers worker 2 done, got 5 numbers worker 11 done, got 6 numbers worker 13 done, got 5 numbers real0m0.325s user0m1.375s sys 0m0.692s ``` If I comment out the join() and uncomment set_start_method('fork'), the timing is ``` $ time python3 multi1.py worker 0 done, got 5 numbers worker 3 done, got 5 numbers worker 2 done, got 5 numbers worker 1 done, got 5 numbers worker 5 done, got 5 numbers worker 10 done, got 5 numbers worker 6 done, got 5 numbers worker 4 done, got 5 numbers worker 7 done, got 5 numbers worker 9 done, got 5 numbers worker 8 done, got 5 numbers worker 14 done, got 5 numbers worker 11 done, got 5 numbers worker 12 done, got 5 numbers worker 13 done, got 5 numbers worker 16 done, got 5 numbers worker 15 done, got 5 numbers worker 17 done, got 5 numbers worker 18 done, got 5 numbers worker 19 done, got 5 numbers real0m0.175s user0m0.073s sys 0m0.070s ``` You can observe the difference. Here's the timing if I don't bother with either join() or set_start_method(), but import "multiprocess" instead: ``` $ time python3 multi2.py worker 0 done, got 5 numbers worker 1 done, got 5 numbers worker 2 done, got 5 numbers worker 4 done, got 5 numbers worker 3 do
[issue28965] Multiprocessing spawn/forkserver fails to pass Queues
Mouse added the comment: On MacOS Catalina 10.15.4, I still see this problem occasionally even with p.join() added. See https://bugs.python.org/msg365251 and subsequent messages. Also, see https://bugs.python.org/issue40106. -- nosy: +mouse07410 ___ Python tracker <https://bugs.python.org/issue28965> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33725] Python crashes on macOS after fork with no exec
Mouse added the comment: @mark.dickinson, thank you. Following your suggestion, I've added a comment in #28965, and created a new issue https://bugs.python.org/issue40106. -- nosy: +vstinner ___ Python tracker <https://bugs.python.org/issue33725> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25495] binascii documentation incorrect
New submission from Mouse: binascii b2a_base64() documentation says: The length of data should be at most 57 to adhere to the base64 standard. This is incorrect, because there is no base64 standard that restricts the length of input data, especially to such a small value. What RFC4648 (that superseded RFC3548 that your documentation still keeps referring to) actually says is that MIME enforces the limit ofthe OUTPUT LINE length at 76, but NOT of the entire output, and certainly not of the entire input. Please correct the documentation, making it conformant with what the ACTUAL base64 standard says. See https://en.wikipedia.org/wiki/Base64 and https://tools.ietf.org/html/rfc4648 Thanks! -- assignee: docs@python components: Documentation messages: 253572 nosy: docs@python, mouse07410 priority: normal severity: normal status: open title: binascii documentation incorrect versions: Python 3.5 ___ Python tracker <http://bugs.python.org/issue25495> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25495] binascii documentation incorrect
Mouse added the comment: Yes I know where this came from. :-) Here is my proposed change. Replace the statement The length of data should be at most 57 to adhere to the base64 standard. with: To be MIME-compliant, the Base64 output (as defined in RFC4648) should be broken into lines of at most 76 characters long. This post-processing of the output is the responsibility of the caller. Note that the original PEM context-transfer encoding limited line length to 64 characters. Would this change be agreeable to you? -- ___ Python tracker <http://bugs.python.org/issue25495> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25495] binascii documentation incorrect
Mouse added the comment: Thank you for the quick turn-around, and for taking care of this issue! -- ___ Python tracker <http://bugs.python.org/issue25495> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25495] binascii documentation incorrect
Mouse added the comment: As far as I remember, the data was not "originally processed in 57-byte chunks". I've been around the first PEM and MIME standards and discussions (and code, though not in Python, which wasn't around then) to be in position to know. :) Whether the user prefers to process data in chunks or not, is up to the user. Not to mention that PEM is long gone, and MIME also changed somewhat. The link between this function and RFC4648 can and should be more explicit, but I think just referring to it is enough. Do you have a recommendation for additional info to explain newline issues? Yes, changing "Base64 output" to "function output" makes perfect sense. Thanks! -- ___ Python tracker <http://bugs.python.org/issue25495> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25495] binascii documentation incorrect
Mouse added the comment: 1. I concede knowing nothing about the early Python library implementation, functionality, or even purpose. 2. I don't think it makes sense now to either refer to PEM. We'd be two decades too late for that (well, 27 years, to be precise :). See https://en.wikipedia.org/wiki/Privacy-enhanced_Electronic_Mail 3. I don't think we are in position to tell programmers how to split a string of characters into 76-long chunks. Not to mention that the example you gave is likely to suffer in performance (just count those function calls), compared to other methods, and won't reflect well on the authors. Here's one possible doc version: ''' Convert binary data to the base 64 encoding defined in :rfc:`4648`. The return value includes a trailing newline ``b"\n"`` if *newline* is true. If the output is used as Base64 transfer encoding for MIME (:rfc: 2045), base 64 output should be broken into lines at most 76 characters long to be compliant. Base64 encoding standard does not limit the maximum encoded line length. ''' -- ___ Python tracker <http://bugs.python.org/issue25495> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25495] binascii documentation incorrect
Mouse added the comment: Let's not insinuate anything about the input. This is about what constraints on the OUTPUT MAY be there, not a tutorial from the 80-ties on how one might accomplish it. -- ___ Python tracker <http://bugs.python.org/issue25495> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25495] binascii documentation incorrect
Mouse added the comment: And even those constraints depend on the use. E.g. X.509 does not have those. -- ___ Python tracker <http://bugs.python.org/issue25495> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25495] binascii documentation incorrect
Mouse added the comment: 1. I am OK with the following text, modeling referred Perldoc: b2a_base64( $bytes, $eol ); Encode data by calling the encode_base64() function. The first argument is the byte string to encode. The second argument is optional, and provides the line-ending sequence to use. When it is given, the returned encoded string is broken into lines of no more than 76 characters each and it will end with $eol unless it is empty. Pass an empty string, or no second argument at all if you do not want the encoded string to be broken into lines. 2. I already had people telling me that "Python-3 doc prohibits input longer than 57 bytes, even though it doesn't currently enforce it". Please help putting end to spreading of this confusion. -- ___ Python tracker <http://bugs.python.org/issue25495> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25495] binascii documentation incorrect
Mouse added the comment: The harm in mentioning the 57-byte chunking is that so far it successfully confused people. b2a_base64() function is not coupled to MIME. It has no constraints on either its input, or its output. *IF* it is used by (in) a MIME application, then the caller may want to make its output RFC 2045-compliant, by whatever way he chooses. Giving (an unwelcome) advice to a writer of one specific application is in my opinion completely out of scope here. Justification that it used to matter 25 years ago and therefore should be kept here doesn't make sense to me. I strongly insist that this "chunking" thing does not belong, and must be removed. -- ___ Python tracker <http://bugs.python.org/issue25495> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25495] binascii documentation incorrect
Mouse added the comment: Unfortunately, NO. The problem (and this bug report) is for Python-3 documentation, so trying to address it in Python-2 rather than in Python-3 does not make sense. We seem to both understand and agree that there is no length limitation on b2a_base64() input, either recommended or enforced - contrary to what the current Python-3 documentation implies. We understand that *if* the *output* of this function is intended for use in MIME (rather than X.509 or whatever else Base64 is good for), then the caller should do other things besides calling b2a_base64(), and in all likelihood the caller is already aware of that - after all, if he figured that he needs Base64 in his stuff, he probably knows something about what MIME standards say and require?. I repeat my original complaint: Python-3 documentation is buggy because it implies a restriction on the input that is not there. This reference should be removed from there because it confuses people. I've talked to those confused personally, so this is first-hand. I refer you to the original msg253572 of this bug report. If you want to write a MIME-in-Python tutorial, it is up to you - but b2a_base64() does not seem to be the right place for it. (And I'd rather see an X.509 tutorial if you're dead set on writing something besides strict plain b2a_base64() doc. :-) -- ___ Python tracker <http://bugs.python.org/issue25495> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25495] binascii documentation incorrect
Mouse added the comment: To add: I do not understand your attachment to that 57 "...(exactly 57 bytes of input data per line)", and request that this parenthesized sentence is removed from your Python-2.7 doc patch. Please give the reader the benefit of the doubt, and allow that *if* he wants to repeatedly call b2a_base64() instead of splitting its output - the ability to compute (76 * 3 / 4) is within his skill level. -- ___ Python tracker <http://bugs.python.org/issue25495> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25495] binascii documentation incorrect
Mouse added the comment: > my patch should be valid for 3.5 also. > The relevant wording is identical to 2.7. OK. > I have resisted removing the magic number 57 for a couple > of reasons. Reading existing code that uses this number may > be harder. You expect to see "existing code that uses this number" in Python-3.5+? Interesting... (Care to point me at a couple of samples of such "existing" Python-3 code?) And you expect that the main info source for understanding the reason behind that "57" (assuming this function is invoked that way, as opposed to splitting the output :) would be the doc for this function, rather than the main program, or RFC 2045, or...? Fine. > It helps explain how the function was originally to be used, > and why the newline is appended. Pardon me, but why do you think anybody would care...? There are tons of functions, old and new, with more new ones popping up fast enough. I'd really envy a person who has time to enjoy history of one minuscule function of an old (albeit still useful :) library. OK. You think a history of this function should be documented - fine. I don't need it (and don't think anybody else wants to read it either), but it's not my doc or my decision. Just get the darn bug fixed. -- ___ Python tracker <http://bugs.python.org/issue25495> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25495] binascii documentation incorrect
Mouse added the comment: Status...? -- ___ Python tracker <http://bugs.python.org/issue25495> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com