[issue33608] Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed.

2019-05-18 Thread Pavel Kostyuchenko


Pavel Kostyuchenko  added the comment:

I was able to reproduce the error with version 
f13c5c8b9401a9dc19e95d8b420ee100ac022208 on FreeBSD 12.0 VM. The error seems to 
be caused not by those changes, but by lack of synchronization in the 
multiprocessing.managers.Server.
The failure happens when running the 
"test_shared_memory_SharedMemoryManager_basics" with high CPU load and frequent 
interrupts e.g. moving some window during test. Mostly I used the "python -m 
test --fail-env-changed test_multiprocessing_spawn -m 'WithProcessesTestS[hu]*' 
-F" command to reproduce the crash.
By analyzing core dumps I deduced that the crash happens during this call from 
the parent test process:

class BaseManager(object):
def _finalize_manager(process, address, authkey, state, _Client):
...
try:
conn = _Client(address, authkey=authkey)
try:
dispatch(conn, None, 'shutdown')
finally:
conn.close()
except Exception:
pass

Main thread in the multiprocessing child:

class Server(object):
def serve_forever(self):
...
try:
accepter = threading.Thread(target=self.accepter)
accepter.daemon = True
accepter.start()
try:
while not self.stop_event.is_set():
self.stop_event.wait(1)
except (KeyboardInterrupt, SystemExit):
pass
finally:
...
sys.exit(0)  << main thread have finished and destroyed the 
interpreter

Worker thread in the multiprocessing child.
Locals:
File "/usr/home/user/cpython/Lib/multiprocessing/managers.py", line 214, in 
handle_request
c.send(msg)
self = 
funcname = 'shutdown'
result = None
request = (None, 'shutdown', (), {})
ignore = None
args = ()
kwds = {}
msg = ('#RETURN', None)

Listing:
class Server(object):
def handle_request(self, c):
...
try:
result = func(c, *args, **kwds)  << calls Server.shutdown method
except Exception:
msg = ('#TRACEBACK', format_exc())
else:
msg = ('#RETURN', result)
try:
c.send(msg)  << crashes with SIGBUS in _send_bytes -> write -> 
take_gil -> SET_GIL_DROP_REQUEST(tstate->interp)
except Exception as e:
try:
c.send(('#TRACEBACK', format_exc()))
except Exception:
pass
...
def shutdown(self, c):
...
try:
util.debug('manager received shutdown message')
c.send(('#RETURN', None))
except:
import traceback
traceback.print_exc()
finally:
self.stop_event.set()

Worker thread is daemonic and is not terminated during the interpreter 
finalization, thus it might still be running and is terminated silently when 
the process exits. The connection (c) has different implementations on several 
platforms, so we cannot be sure whether the connection is closed during 
shutdown or not, whether the last "c.send(msg)" blocks until the end of the 
process, returns instantly, or fails inconsistently.
The error was there for a long time, but for two reasons it didn't cause much 
trouble:
- the race condition is hard to trigger;
- SET_GIL_DROP_REQUEST used to ignore the errorneous state of interpreter, but 
introduction of tstate->interp argument by Eric manifested SIGBUS on FreeBSD.

I haven't managed to find a nice clean test to reproduce the bug automatically. 
I suggest the changes for the multiprocessing/managers.py in the attachment.

--
nosy: +shprotx
Added file: https://bugs.python.org/file48333/managers.patch

___
Python tracker 
<https://bugs.python.org/issue33608>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33608] Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed.

2019-05-18 Thread Pavel Kostyuchenko


Pavel Kostyuchenko  added the comment:

Also it might be viable to add some assertion to verify the take_gil is not 
called with uninitialized interpreter.
I used the changes in the attachment (take_gil.assert.patch), but it produced 
errors during test_tracemalloc with f13c5c8b9401a9dc19e95d8b420ee100ac022208 . 
It happens because, during startup with invalid arguments, the interpreter is 
finalized with pymain_main->pymain_free->_PyRuntime_Finalize before the error 
is printed. However, the problem seems to be fixed for me in the last revisions 
of master branch, so I upload the diff against it.

--
Added file: https://bugs.python.org/file48334/take_gil.assert.patch

___
Python tracker 
<https://bugs.python.org/issue33608>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com