Debugging a memory leak

2020-10-22 Thread Pasha Stetsenko
Dear Python gurus,

I'm a maintainer of a python library "datatable" (can be installed from
PyPi), and i've been recently trying to debug a memory leak that occurs in
my library.
The program that exposes the leak is quite simple:
```
import datatable as dt
import gc  # just in case

def leak(n=10**7):
for i in range(n):
z = dt.update()

leak()
gc.collect()
input("Press enter")
```
Note that despite the name, the `dt.update` is actually a class, though it
is defined via Python C API. Thus, this script is expected to create and
then immediately destroy 10 million simple python objects.
The observed behavior, however,  is  that the script consumes more and more
memory, eventually ending up at about 500M. The amount of memory the
program ends up consuming is directly proportional to the parameter `n`.

The `gc.get_objects()` does not show any extra objects however. The
`tracemalloc` module shows that there are indeed `n` objects leaked in the
`z=dt.update()` line, but doesn't give any extra details.

In order to dig deeper, I let the process wait on the "input()" line, and
wrote a script to dump the process' memory into a file. Then I scanned
through the file looking at any repeated patterns of 64-bit words. Inside
the memory dump, the following sequences were the most common:
```
  0x - 28660404
  0x0001024be6e8 - 4999762
  0x000101cbdea0 - 119049
  0x0054 - 59537
  0x0fd00ff0 - 59526
  0x0001 - 16895
  0x - 12378
  ...
```
The most suspicious sequence here is 0x0001024be6e8, which if you look
at that address with lldb, is the address of the PyTypeObject "dt.update",
which looks approximately like this:
```
(lldb) p *(PyTypeObject*)(0x00010f4206e8)
(PyTypeObject) $0 = {
  ob_base = {
ob_base = {
  ob_refcnt = 8
  ob_type = 0x00010ec216b0
}
ob_size = 0
  }
  tp_name = 0x00010f3a442c "datatable.update"
  tp_basicsize = 48
  tp_itemsize = 0
  tp_dealloc = 0x00010f0a8040 (_datatable.cpython-36m-darwin.so`void
py::_safe_dealloc(_object*) at
xobject.h:270)
  tp_print = 0x
  tp_getattr = 0x
  tp_setattr = 0x
  tp_as_async = 0x
  tp_repr = 0x00010eab3fa0 (Python`object_repr)
  tp_as_number = 0x
  tp_as_sequence = 0x
  tp_as_mapping = 0x
  tp_hash = 0x00010eb48640 (Python`_Py_HashPointer)
  tp_call = 0x
  tp_str = 0x00010eab40d0 (Python`object_str)
  tp_getattro = 0x00010eaa1ae0 (Python`PyObject_GenericGetAttr)
  tp_setattro = 0x00010eaa1ce0 (Python`PyObject_GenericSetAttr)
  tp_as_buffer = 0x
  tp_flags = 266240
...
```
Thus, I can be quite certain that 0x1024be6e8 is the address of the
`dt.update` type structure.

The way this address appears in the memory dump looks like this:
```
0x7f97875cbb10: 0x 0x 0x024be6e8 0x0001
0x7f97875cbb20: 0x 0x 0x 0x
0x7f97875cbb30: 0x 0x 0x 0x
0x7f97875cbb40: 0x 0x 0x024be6e8 0x0001
0x7f97875cbb50: 0x 0x 0x 0x
0x7f97875cbb60: 0x 0x 0x 0x
0x7f97875cbb70: 0x 0x 0x024be6e8 0x0001
0x7f97875cbb80: 0x 0x 0x 0x
0x7f97875cbb90: 0x 0x 0x 0x
0x7f97875cbba0: 0x 0x 0x024be6e8 0x0001
0x7f97875cbbb0: 0x 0x 0x 0x
0x7f97875cbbc0: 0x 0x 0x 0x
```
If i guess that all these represent the leaked objects, then inspecting any
of them shows the following:
```
(lldb) p *(PyObject*)(0x7f97875cbb10)
(PyObject) $2 = {
  ob_refcnt = 0
  ob_type = 0x00010f4206e8
}
```
Thus, the object didn't actually "leak" in the normal sense: its refcount
is 0 and it was reclaimed by the Python runtime (when i print a debug
message in tp_dealloc, i see that the destructor gets called every time).
Still, Python keeps requesting more and more memory from the system instead
of reusing the memory  that was supposed to be freed.

I'm currently not sure where to go from here. Is there something wrong with
my python object that prevents it from being correctly processed by the
Python runtime? Because this doesn't seem to be the usual case of
incrementing the refcount too many times.

This behavior was observed in Python 3.6.6 and also Python 3.8.0b2.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Debugging a memory leak

2020-10-23 Thread Pasha Stetsenko
Thanks for all the replies!
Following Chris's advice, I tried to reduce the code to the smallest
reproducible example (I guess I should have done it sooner),
but here's what I came up with:
```
  #include 
  #include 

  static int my_init(PyObject*, PyObject*, PyObject*) { return 0; }
  static void my_dealloc(PyObject*) {}

  static void init_mytype(PyObject* module) {
PyTypeObject* type = new PyTypeObject();
std::memset(type, 0, sizeof(PyTypeObject));
Py_INCREF(type);

type->tp_basicsize = static_cast(sizeof(PyObject));
type->tp_itemsize = 0;
type->tp_flags = Py_TPFLAGS_DEFAULT;
type->tp_new   = &PyType_GenericNew;
type->tp_name  = "mytype";
type->tp_doc   = "[temporary]";
type->tp_init  = my_init;
type->tp_dealloc = my_dealloc;
PyType_Ready(type);
PyModule_AddObject(module, "mytype", reinterpret_cast(type));
  }
```
(my original `update` object had some fields in it, but it turns out they
don't need to be present in order for the problem to manifest. So in this
case I'm creating a custom object which is the same as basic PyObject).
The `init_mytype()` function creates a custom type and attaches it to a
module. After this, creating 100M instances of the object will cause the
process memory to swell to 1.5G:
```
for i  in range(10**8):
z = dt.mytype()
```
I know this is not normal because if instead i used a builtin type such as
`list`, or a python-defined class such as `class  A: pass`, then the
process will remain at steady RAM usage of about 6Mb.

I've  tested this on a Linux platform as well (using docker image
quay.io/pypa/manylinux2010_x86_64), and the problem is present there as
well.

---
PS: The library I'm working on is open source, available at
https://github.com/h2oai/datatable, but the code I posted  above is
completely independent from my library.

On Fri, Oct 23, 2020 at 10:44 AM Dieter Maurer  wrote:

> Pasha Stetsenko wrote at 2020-10-22 17:51 -0700:
> > ...
> >I'm a maintainer of a python library "datatable" (can be installed from
> >PyPi), and i've been recently trying to debug a memory leak that occurs in
> >my library.
> >The program that exposes the leak is quite simple:
> >```
> >import datatable as dt
> >import gc  # just in case
> >
> >def leak(n=10**7):
> >for i in range(n):
> >z = dt.update()
> >
> >leak()
> >gc.collect()
> >input("Press enter")
> >```
> >Note that despite the name, the `dt.update` is actually a class, though it
> >is defined via Python C API. Thus, this script is expected to create and
> >then immediately destroy 10 million simple python objects.
> >The observed behavior, however,  is  that the script consumes more and
> more
> >memory, eventually ending up at about 500M. The amount of memory the
> >program ends up consuming is directly proportional to the parameter `n`.
> >
> >The `gc.get_objects()` does not show any extra objects however.
>
> For efficiency reasons, the garbage collector treats only
> objects from types which are known to be potentially involved in cycles.
> A type implemented in "C" must define `tp_traverse` (in its type
> structure) to indicate this possibility.
> `tp_traverse` also tells the garbage collector how to find referenced
> objects.
> You will never find an object in the result of `get_objects` the
> type of which does not define `tp_traverse`.
>
> > ...
> >Thus, the object didn't actually "leak" in the normal sense: its refcount
> >is 0 and it was reclaimed by the Python runtime (when i print a debug
> >message in tp_dealloc, i see that the destructor gets called every time).
> >Still, Python keeps requesting more and more memory from the system
> instead
> >of reusing the memory  that was supposed to be freed.
>
> I would try to debug what happens further in `tp_dealloc` and its callers.
> You should eventually see a `PyMem_free` which gives the memory back
> to the Python memory management (built on top of the C memory management).
>
> Note that your `tp_dealloc` should not call the "C" library's "free".
> Python builds its own memory management (--> "PyMem_*") on top
> of the "C" library. It handles all "small" memory requests
> and, if necessary, requests big data chunks via `malloc` to split
> them into the smaller sizes.
> Should you "free" small memory blocks directly via "free", that memory
> becomes effectively unusable by Python (unless you have a special
> allocation as well).
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Debugging a memory leak

2020-10-23 Thread Pasha Stetsenko
Thanks MRAB, this was it.
I guess I was thinking about tp_dealloc as a C++ destructor, where the base
class' destructor is called automatically.

On Fri, Oct 23, 2020 at 11:59 AM MRAB  wrote:

> On 2020-10-23 19:32, Pasha Stetsenko wrote:
> > Thanks for all the replies!
> > Following Chris's advice, I tried to reduce the code to the smallest
> > reproducible example (I guess I should have done it sooner),
> > but here's what I came up with:
> > ```
> >#include 
> >#include 
> >
> >static int my_init(PyObject*, PyObject*, PyObject*) { return 0; }
> >static void my_dealloc(PyObject*) {}
> >
> >static void init_mytype(PyObject* module) {
> >  PyTypeObject* type = new PyTypeObject();
> >  std::memset(type, 0, sizeof(PyTypeObject));
> >  Py_INCREF(type);
> >
> >  type->tp_basicsize = static_cast(sizeof(PyObject));
> >  type->tp_itemsize = 0;
> >  type->tp_flags = Py_TPFLAGS_DEFAULT;
> >  type->tp_new   = &PyType_GenericNew;
> >  type->tp_name  = "mytype";
> >  type->tp_doc   = "[temporary]";
> >  type->tp_init  = my_init;
> >  type->tp_dealloc = my_dealloc;
> >  PyType_Ready(type);
> >  PyModule_AddObject(module, "mytype",
> reinterpret_cast(type));
> >}
> > ```
>
> You're setting the deallocation function to 'my_dealloc', but that
> function isn't deallocating the object.
>
> Try something like this:
>
> static void my_dealloc(PyObject* obj) {
>  PyObject_DEL(obj);
> }
>
> [snip]
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list