Debugging a memory leak
Dear Python gurus, I'm a maintainer of a python library "datatable" (can be installed from PyPi), and i've been recently trying to debug a memory leak that occurs in my library. The program that exposes the leak is quite simple: ``` import datatable as dt import gc # just in case def leak(n=10**7): for i in range(n): z = dt.update() leak() gc.collect() input("Press enter") ``` Note that despite the name, the `dt.update` is actually a class, though it is defined via Python C API. Thus, this script is expected to create and then immediately destroy 10 million simple python objects. The observed behavior, however, is that the script consumes more and more memory, eventually ending up at about 500M. The amount of memory the program ends up consuming is directly proportional to the parameter `n`. The `gc.get_objects()` does not show any extra objects however. The `tracemalloc` module shows that there are indeed `n` objects leaked in the `z=dt.update()` line, but doesn't give any extra details. In order to dig deeper, I let the process wait on the "input()" line, and wrote a script to dump the process' memory into a file. Then I scanned through the file looking at any repeated patterns of 64-bit words. Inside the memory dump, the following sequences were the most common: ``` 0x - 28660404 0x0001024be6e8 - 4999762 0x000101cbdea0 - 119049 0x0054 - 59537 0x0fd00ff0 - 59526 0x0001 - 16895 0x - 12378 ... ``` The most suspicious sequence here is 0x0001024be6e8, which if you look at that address with lldb, is the address of the PyTypeObject "dt.update", which looks approximately like this: ``` (lldb) p *(PyTypeObject*)(0x00010f4206e8) (PyTypeObject) $0 = { ob_base = { ob_base = { ob_refcnt = 8 ob_type = 0x00010ec216b0 } ob_size = 0 } tp_name = 0x00010f3a442c "datatable.update" tp_basicsize = 48 tp_itemsize = 0 tp_dealloc = 0x00010f0a8040 (_datatable.cpython-36m-darwin.so`void py::_safe_dealloc(_object*) at xobject.h:270) tp_print = 0x tp_getattr = 0x tp_setattr = 0x tp_as_async = 0x tp_repr = 0x00010eab3fa0 (Python`object_repr) tp_as_number = 0x tp_as_sequence = 0x tp_as_mapping = 0x tp_hash = 0x00010eb48640 (Python`_Py_HashPointer) tp_call = 0x tp_str = 0x00010eab40d0 (Python`object_str) tp_getattro = 0x00010eaa1ae0 (Python`PyObject_GenericGetAttr) tp_setattro = 0x00010eaa1ce0 (Python`PyObject_GenericSetAttr) tp_as_buffer = 0x tp_flags = 266240 ... ``` Thus, I can be quite certain that 0x1024be6e8 is the address of the `dt.update` type structure. The way this address appears in the memory dump looks like this: ``` 0x7f97875cbb10: 0x 0x 0x024be6e8 0x0001 0x7f97875cbb20: 0x 0x 0x 0x 0x7f97875cbb30: 0x 0x 0x 0x 0x7f97875cbb40: 0x 0x 0x024be6e8 0x0001 0x7f97875cbb50: 0x 0x 0x 0x 0x7f97875cbb60: 0x 0x 0x 0x 0x7f97875cbb70: 0x 0x 0x024be6e8 0x0001 0x7f97875cbb80: 0x 0x 0x 0x 0x7f97875cbb90: 0x 0x 0x 0x 0x7f97875cbba0: 0x 0x 0x024be6e8 0x0001 0x7f97875cbbb0: 0x 0x 0x 0x 0x7f97875cbbc0: 0x 0x 0x 0x ``` If i guess that all these represent the leaked objects, then inspecting any of them shows the following: ``` (lldb) p *(PyObject*)(0x7f97875cbb10) (PyObject) $2 = { ob_refcnt = 0 ob_type = 0x00010f4206e8 } ``` Thus, the object didn't actually "leak" in the normal sense: its refcount is 0 and it was reclaimed by the Python runtime (when i print a debug message in tp_dealloc, i see that the destructor gets called every time). Still, Python keeps requesting more and more memory from the system instead of reusing the memory that was supposed to be freed. I'm currently not sure where to go from here. Is there something wrong with my python object that prevents it from being correctly processed by the Python runtime? Because this doesn't seem to be the usual case of incrementing the refcount too many times. This behavior was observed in Python 3.6.6 and also Python 3.8.0b2. -- https://mail.python.org/mailman/listinfo/python-list
Re: Debugging a memory leak
Thanks for all the replies! Following Chris's advice, I tried to reduce the code to the smallest reproducible example (I guess I should have done it sooner), but here's what I came up with: ``` #include #include static int my_init(PyObject*, PyObject*, PyObject*) { return 0; } static void my_dealloc(PyObject*) {} static void init_mytype(PyObject* module) { PyTypeObject* type = new PyTypeObject(); std::memset(type, 0, sizeof(PyTypeObject)); Py_INCREF(type); type->tp_basicsize = static_cast(sizeof(PyObject)); type->tp_itemsize = 0; type->tp_flags = Py_TPFLAGS_DEFAULT; type->tp_new = &PyType_GenericNew; type->tp_name = "mytype"; type->tp_doc = "[temporary]"; type->tp_init = my_init; type->tp_dealloc = my_dealloc; PyType_Ready(type); PyModule_AddObject(module, "mytype", reinterpret_cast(type)); } ``` (my original `update` object had some fields in it, but it turns out they don't need to be present in order for the problem to manifest. So in this case I'm creating a custom object which is the same as basic PyObject). The `init_mytype()` function creates a custom type and attaches it to a module. After this, creating 100M instances of the object will cause the process memory to swell to 1.5G: ``` for i in range(10**8): z = dt.mytype() ``` I know this is not normal because if instead i used a builtin type such as `list`, or a python-defined class such as `class A: pass`, then the process will remain at steady RAM usage of about 6Mb. I've tested this on a Linux platform as well (using docker image quay.io/pypa/manylinux2010_x86_64), and the problem is present there as well. --- PS: The library I'm working on is open source, available at https://github.com/h2oai/datatable, but the code I posted above is completely independent from my library. On Fri, Oct 23, 2020 at 10:44 AM Dieter Maurer wrote: > Pasha Stetsenko wrote at 2020-10-22 17:51 -0700: > > ... > >I'm a maintainer of a python library "datatable" (can be installed from > >PyPi), and i've been recently trying to debug a memory leak that occurs in > >my library. > >The program that exposes the leak is quite simple: > >``` > >import datatable as dt > >import gc # just in case > > > >def leak(n=10**7): > >for i in range(n): > >z = dt.update() > > > >leak() > >gc.collect() > >input("Press enter") > >``` > >Note that despite the name, the `dt.update` is actually a class, though it > >is defined via Python C API. Thus, this script is expected to create and > >then immediately destroy 10 million simple python objects. > >The observed behavior, however, is that the script consumes more and > more > >memory, eventually ending up at about 500M. The amount of memory the > >program ends up consuming is directly proportional to the parameter `n`. > > > >The `gc.get_objects()` does not show any extra objects however. > > For efficiency reasons, the garbage collector treats only > objects from types which are known to be potentially involved in cycles. > A type implemented in "C" must define `tp_traverse` (in its type > structure) to indicate this possibility. > `tp_traverse` also tells the garbage collector how to find referenced > objects. > You will never find an object in the result of `get_objects` the > type of which does not define `tp_traverse`. > > > ... > >Thus, the object didn't actually "leak" in the normal sense: its refcount > >is 0 and it was reclaimed by the Python runtime (when i print a debug > >message in tp_dealloc, i see that the destructor gets called every time). > >Still, Python keeps requesting more and more memory from the system > instead > >of reusing the memory that was supposed to be freed. > > I would try to debug what happens further in `tp_dealloc` and its callers. > You should eventually see a `PyMem_free` which gives the memory back > to the Python memory management (built on top of the C memory management). > > Note that your `tp_dealloc` should not call the "C" library's "free". > Python builds its own memory management (--> "PyMem_*") on top > of the "C" library. It handles all "small" memory requests > and, if necessary, requests big data chunks via `malloc` to split > them into the smaller sizes. > Should you "free" small memory blocks directly via "free", that memory > becomes effectively unusable by Python (unless you have a special > allocation as well). > -- https://mail.python.org/mailman/listinfo/python-list
Re: Debugging a memory leak
Thanks MRAB, this was it. I guess I was thinking about tp_dealloc as a C++ destructor, where the base class' destructor is called automatically. On Fri, Oct 23, 2020 at 11:59 AM MRAB wrote: > On 2020-10-23 19:32, Pasha Stetsenko wrote: > > Thanks for all the replies! > > Following Chris's advice, I tried to reduce the code to the smallest > > reproducible example (I guess I should have done it sooner), > > but here's what I came up with: > > ``` > >#include > >#include > > > >static int my_init(PyObject*, PyObject*, PyObject*) { return 0; } > >static void my_dealloc(PyObject*) {} > > > >static void init_mytype(PyObject* module) { > > PyTypeObject* type = new PyTypeObject(); > > std::memset(type, 0, sizeof(PyTypeObject)); > > Py_INCREF(type); > > > > type->tp_basicsize = static_cast(sizeof(PyObject)); > > type->tp_itemsize = 0; > > type->tp_flags = Py_TPFLAGS_DEFAULT; > > type->tp_new = &PyType_GenericNew; > > type->tp_name = "mytype"; > > type->tp_doc = "[temporary]"; > > type->tp_init = my_init; > > type->tp_dealloc = my_dealloc; > > PyType_Ready(type); > > PyModule_AddObject(module, "mytype", > reinterpret_cast(type)); > >} > > ``` > > You're setting the deallocation function to 'my_dealloc', but that > function isn't deallocating the object. > > Try something like this: > > static void my_dealloc(PyObject* obj) { > PyObject_DEL(obj); > } > > [snip] > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list