New submission from STINNER Victor <vstin...@python.org>:
Copy of my email sent to python-dev: https://mail.python.org/archives/list/python-...@python.org/thread/C4ILXGPKBJQYUN5YDMTJOEOX7RHOD4S3/ Hi, In the Python stdlib, many heap types currently don't "properly" (fully?) implement the GC protocol which can prevent to destroy these types at Python exit. As a side effect, some other Python objects can also remain alive, and so are not destroyed neither. There is an on-going effect to destroy all Python objects at exit (bpo-1635741). This problem is getting worse when subinterpreters are involved: Refleaks buildbots failures which prevent to spot other regressions, and so these "leaks" / "GC bugs" must be fixed as soon as possible. In my experience, many leaks spotted by tests using subinterpreters were quite old, it's just that they were ignored previously. It's an hard problem and I don't see any simple/obvious solution right now, except of workarounds that I dislike. Maybe the only good solution is to fix all heap types, one by one. == Only the Python stdlib should be affected == PyType_FromSpec() was added to Python 3.2 by the PEP 384 to define "heap types" in C, but I'm not sure if it's popular in practice (ex: Cython doesn't use it, but defines static types). I expect that most types to still be defined the old style (static types) in a vas majority of third party extension modules. To be clear, static types are not affected by this email. Third party extension modules using the limited C API (to use the stable ABI) and PyType_FromSpec() can be affected (if they don't fully implement the GC protocol). == Heap type instances now stores a strong reference to their type == In March 2019, the PyObject_Init() function was modified in bpo-35810 to keep a strong reference (INCREF) to the type if the type is a heap type. The fixed problem was that heap types could be destroyed before the last instance is destroyed. == GC and heap types == The new problem is that most heap types don't collaborate well with the garbage collector. The garbage collector doesn't know anything about Python objects, types, reference counting or anything. It only uses the PyGC_Head header and the traverse functions. If an object holds a strong reference to an object but its type does not define a traverse function, the GC cannot guess/infer this reference. A heap type must respect the following 3 conditions to collaborate with the GC: Have the Py_TPFLAGS_HAVE_GC flag; Define a traverse function (tp_traverse) which visits the type: Py_VISIT(Py_TYPE(self)); Instances must be tracked by the GC. If one of these conditions is not met, the GC can fail to destroy a type during a GC collection. If an instance is kept alive late while a Python interpreter is being deleted, it's possible that the type is never deleted, which can keep indirectly many objects alive and so don't delete them neither. In practice, when a type is not deleted, a test using subinterpreter starts to fail on Refleaks buildbot since it leaks references. Without subinterpreters, such leak is simply ignored, whereas this is an on-going effect to delete Python objects at exit (bpo-1635741). == Boring traverse functions == Currently, there is no default traverse implementation which visits the type. For example, I had the implement the following function for _thread.LockType: static int lock_traverse(lockobject self, visitproc visit, void arg) { Py_VISIT(Py_TYPE(self)); return 0; } It's a little bit annoying to have to implement the GC protocol whereas a lock cannot contain other Python objects, it's not a container. It's just a thin wrapper to a C lock. There is exactly one strong reference: to the type. == Workaround: loop on gc.collect() == A workaround is to run gc.collect() in a loop until it returns 0 (no object was collected). == Traverse automatically? Nope. == Pablo Galindo attempts to automatically visit the type in the traverse function: https://bugs.python.org/issue40217 https://github.com/python/cpython/commit/0169d3003be3d072751dd14a5c84748ab63... Moreover, What's New in Python 3.9 contains a long section suggesting to implement a traverse function for this problem, but it doesn't suggest to track instances: https://docs.python.org/dev/whatsnew/3.9.html#changes-in-the-c-api This solution causes too many troubles, and so instead, traverse functions were defined on heap types to visit the type. Currently in the master branch, 89 types are defined as heap types on a total of 206 types (117 types are defined statically). I don't think that these 89 heap types respect the 3 conditions to collaborate with the GC. == How should we address this issue? == I'm not sure what should be done. Working around the issue by triggering multiple GC collections? Emit a warning in development mode if a heap type doesn't collaborate well with the GC? If core developers miss these bugs and have troubles to debug them, I expect that extension module authors would suffer even more. == GC+heap type bugs became common == I'm fixing such GC issue for 1 year as part as the work on cleaning Python objects at exit, and also indirectly related to subinterpreters. The behavior is surprising, it's really hard to dig into GC internals and understand what's going on. I wrote an article on this kind of "GC bugs": https://vstinner.github.io/subinterpreter-leaks.html Today, I learnt the hard way that defining a traverse is not enough. The type constructor (tp_new) must also track instances! See my fix for _multibytecodec related to CJK codecs: https://github.com/python/cpython/commit/11ef53aefbecfac18b63cee518a7184f771... https://bugs.python.org/issue42866 == Reference cycles are common == The GC only serves to break reference cycles. But reference cycles are rare, right? Well... First of all, most types create reference cycles involing themselves. For example, a type __mro__ tuple contains the type which already creates a ref cycle. Type methods can also contain a reference to the type. => The GC must break the cycle, otherwise the type cannot be destroyed When a function is defined in a Python module, the function __globals__ is the module namespace (module.__dict__) which... contains the function. Defining a function in a Python module also creates a reference cycle which prevents to delete the module namespace. If a function is used as a callback somewhere, the whole module remains "alive" until the reference to the callback is cleared. Example. os.register_at_fork() and codecs.register() callbacks are cleared really late during Python finalization. Currently, it's basically the last objects which are cleared at Python exit. After that, there is exactly one final GC collection. => The GC == Debug GC issues == gc.get_referents() and gc.get_referrers() can be used to check traverse functions. gc.is_tracked() can be used to check if the GC tracks an object. Using the gdb debugger on gc_collect_main() helps to see which objects are collected. See for example the finalize_garbage() functions which calls finalizers on unreachable objects. The solution is usually a missing traverse functions or a missing Py_VISIT() in an existing traverse function. == __del__ hack for debugging == If you want to play with the issue or if you have to debug a GC issue, you can use an object which logs a message when it's being deleted: class VerboseDel: def __del__(self): print("DELETE OBJECT") obj = VerboseDel() Warning: creating such object in a module also prevents to destroy the module namespace when the last reference to the module is deleted! __del__.__globals__ contains a reference to the module namespace, and obj.__class__ contains a reference to the type... Yeah, ref cycle and GC issues are fun! == Long email == Yeah, I like to put titles in my long emails. Enjoy. Happy hacking! Victor -- Night gathers, and now my watch begins. It shall not end until my death ---------- components: C API messages: 385297 nosy: vstinner priority: normal severity: normal status: open title: [C API] Heap types (PyType_FromSpec) must fully implement the GC protocol versions: Python 3.10 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue42972> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com