New submission from Ian Wienand: Using 3.5.2-2ubuntu0~16.04.3 (Xenial) we see an occasional segfault during garbage collection of a generator object
A full backtrace is attached, but the crash appears to be triggered inside gen_traverse during gc --- (gdb) info args gen = 0x7f22385f0150 visit = 0x50eaa0 <visit_decref> arg = 0x0 (gdb) print *gen $109 = {ob_base = {ob_refcnt = 1, ob_type = 0xa35760 <PyGen_Type>}, gi_frame = 0x386aed8, gi_running = 1 '\001', gi_code = <code at remote 0x7f223bb42f60>, gi_weakreflist = 0x0, gi_name = 'linesplit', gi_qualname = 'linesplit'} --- I believe gen_traverse is doing the following --- static int gen_traverse(PyGenObject *gen, visitproc visit, void *arg) { Py_VISIT((PyObject *)gen->gi_frame); Py_VISIT(gen->gi_code); Py_VISIT(gen->gi_name); Py_VISIT(gen->gi_qualname); return 0; } --- The problem here being that this generator's gen->gi_frame has managed to acquire a NULL object type but still has references --- (gdb) print *gen->gi_frame $112 = {ob_base = {ob_base = {ob_refcnt = 2, ob_type = 0x0}, ob_size = 0}, f_back = 0x0, f_code = 0xca3e4fd8950fef91, ... --- Thus it gets visited and it doesn't go well. I have attached the py-bt as well, it's very deep with ansible, multiprocessing forking, imp.load_source() importing ... basically a nightmare. I have not managed to get it down to any sort of minimal test case unfortunately. This happens fairly infrequently, so suggests a race. The generator in question has a socket involved: --- def linesplit(socket): buff = socket.recv(4096).decode("utf-8") buffering = True while buffering: if "\n" in buff: (line, buff) = buff.split("\n", 1) yield line + "\n" else: more = socket.recv(4096).decode("utf-8") if not more: buffering = False else: buff += more if buff: yield buff --- Wild speculation but maybe something to do with finalizing generators with file-descriptors across fork()? At this point we are trying a work-around of not having the above socket reading routine in a generator but just a "regular" loop. As it triggers as part of a production roll-out I'm not sure we can do too much more debugging. Unless this rings any immediate bells for people, we can probably just have this for tracking at this point. [1] is the original upstream issue [1] https://storyboard.openstack.org/#!/story/2001186#comment-17441 ---------- components: Interpreter Core files: crash-bt.txt messages: 301943 nosy: iwienand priority: normal severity: normal status: open title: Segfault during GC of generator object; invalid gi_frame? type: crash versions: Python 3.5 Added file: https://bugs.python.org/file47134/crash-bt.txt _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31426> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com