Dear Python gurus, I'm a maintainer of a python library "datatable" (can be installed from PyPi), and i've been recently trying to debug a memory leak that occurs in my library. The program that exposes the leak is quite simple: ``` import datatable as dt import gc # just in case
def leak(n=10**7): for i in range(n): z = dt.update() leak() gc.collect() input("Press enter") ``` Note that despite the name, the `dt.update` is actually a class, though it is defined via Python C API. Thus, this script is expected to create and then immediately destroy 10 million simple python objects. The observed behavior, however, is that the script consumes more and more memory, eventually ending up at about 500M. The amount of memory the program ends up consuming is directly proportional to the parameter `n`. The `gc.get_objects()` does not show any extra objects however. The `tracemalloc` module shows that there are indeed `n` objects leaked in the `z=dt.update()` line, but doesn't give any extra details. In order to dig deeper, I let the process wait on the "input()" line, and wrote a script to dump the process' memory into a file. Then I scanned through the file looking at any repeated patterns of 64-bit words. Inside the memory dump, the following sequences were the most common: ``` 0x0000000000000000 - 28660404 0x00000001024be6e8 - 4999762 0x0000000101cbdea0 - 119049 0x0000000000000054 - 59537 0x00000fd000000ff0 - 59526 0x0000000000000001 - 16895 0xffffffffffffffff - 12378 ... ``` The most suspicious sequence here is 0x00000001024be6e8, which if you look at that address with lldb, is the address of the PyTypeObject "dt.update", which looks approximately like this: ``` (lldb) p *(PyTypeObject*)(0x000000010f4206e8) (PyTypeObject) $0 = { ob_base = { ob_base = { ob_refcnt = 8 ob_type = 0x000000010ec216b0 } ob_size = 0 } tp_name = 0x000000010f3a442c "datatable.update" tp_basicsize = 48 tp_itemsize = 0 tp_dealloc = 0x000000010f0a8040 (_datatable.cpython-36m-darwin.so`void py::_safe_dealloc<py::oupdate::oupdate_pyobject, &(py::oupdate::oupdate_pyobject::m__dealloc__())>(_object*) at xobject.h:270) tp_print = 0x0000000000000000 tp_getattr = 0x0000000000000000 tp_setattr = 0x0000000000000000 tp_as_async = 0x0000000000000000 tp_repr = 0x000000010eab3fa0 (Python`object_repr) tp_as_number = 0x0000000000000000 tp_as_sequence = 0x0000000000000000 tp_as_mapping = 0x0000000000000000 tp_hash = 0x000000010eb48640 (Python`_Py_HashPointer) tp_call = 0x0000000000000000 tp_str = 0x000000010eab40d0 (Python`object_str) tp_getattro = 0x000000010eaa1ae0 (Python`PyObject_GenericGetAttr) tp_setattro = 0x000000010eaa1ce0 (Python`PyObject_GenericSetAttr) tp_as_buffer = 0x0000000000000000 tp_flags = 266240 ... ``` Thus, I can be quite certain that 0x1024be6e8 is the address of the `dt.update` type structure. The way this address appears in the memory dump looks like this: ``` 0x7f97875cbb10: 0x00000000 0x00000000 0x024be6e8 0x00000001 0x7f97875cbb20: 0x00000000 0x00000000 0x00000000 0x00000000 0x7f97875cbb30: 0x00000000 0x00000000 0x00000000 0x00000000 0x7f97875cbb40: 0x00000000 0x00000000 0x024be6e8 0x00000001 0x7f97875cbb50: 0x00000000 0x00000000 0x00000000 0x00000000 0x7f97875cbb60: 0x00000000 0x00000000 0x00000000 0x00000000 0x7f97875cbb70: 0x00000000 0x00000000 0x024be6e8 0x00000001 0x7f97875cbb80: 0x00000000 0x00000000 0x00000000 0x00000000 0x7f97875cbb90: 0x00000000 0x00000000 0x00000000 0x00000000 0x7f97875cbba0: 0x00000000 0x00000000 0x024be6e8 0x00000001 0x7f97875cbbb0: 0x00000000 0x00000000 0x00000000 0x00000000 0x7f97875cbbc0: 0x00000000 0x00000000 0x00000000 0x00000000 ``` If i guess that all these represent the leaked objects, then inspecting any of them shows the following: ``` (lldb) p *(PyObject*)(0x7f97875cbb10) (PyObject) $2 = { ob_refcnt = 0 ob_type = 0x000000010f4206e8 } ``` Thus, the object didn't actually "leak" in the normal sense: its refcount is 0 and it was reclaimed by the Python runtime (when i print a debug message in tp_dealloc, i see that the destructor gets called every time). Still, Python keeps requesting more and more memory from the system instead of reusing the memory that was supposed to be freed. I'm currently not sure where to go from here. Is there something wrong with my python object that prevents it from being correctly processed by the Python runtime? Because this doesn't seem to be the usual case of incrementing the refcount too many times. This behavior was observed in Python 3.6.6 and also Python 3.8.0b2. -- https://mail.python.org/mailman/listinfo/python-list