Pasha Stetsenko wrote at 2020-10-22 17:51 -0700: > ... >I'm a maintainer of a python library "datatable" (can be installed from >PyPi), and i've been recently trying to debug a memory leak that occurs in >my library. >The program that exposes the leak is quite simple: >``` >import datatable as dt >import gc # just in case > >def leak(n=10**7): > for i in range(n): > z = dt.update() > >leak() >gc.collect() >input("Press enter") >``` >Note that despite the name, the `dt.update` is actually a class, though it >is defined via Python C API. Thus, this script is expected to create and >then immediately destroy 10 million simple python objects. >The observed behavior, however, is that the script consumes more and more >memory, eventually ending up at about 500M. The amount of memory the >program ends up consuming is directly proportional to the parameter `n`. > >The `gc.get_objects()` does not show any extra objects however.
For efficiency reasons, the garbage collector treats only objects from types which are known to be potentially involved in cycles. A type implemented in "C" must define `tp_traverse` (in its type structure) to indicate this possibility. `tp_traverse` also tells the garbage collector how to find referenced objects. You will never find an object in the result of `get_objects` the type of which does not define `tp_traverse`. > ... >Thus, the object didn't actually "leak" in the normal sense: its refcount >is 0 and it was reclaimed by the Python runtime (when i print a debug >message in tp_dealloc, i see that the destructor gets called every time). >Still, Python keeps requesting more and more memory from the system instead >of reusing the memory that was supposed to be freed. I would try to debug what happens further in `tp_dealloc` and its callers. You should eventually see a `PyMem_free` which gives the memory back to the Python memory management (built on top of the C memory management). Note that your `tp_dealloc` should not call the "C" library's "free". Python builds its own memory management (--> "PyMem_*") on top of the "C" library. It handles all "small" memory requests and, if necessary, requests big data chunks via `malloc` to split them into the smaller sizes. Should you "free" small memory blocks directly via "free", that memory becomes effectively unusable by Python (unless you have a special allocation as well). -- https://mail.python.org/mailman/listinfo/python-list