On 2021-11-24 07:59, Arnaud Loonstra wrote:

On 24-11-2021 01:46, MRAB wrote:
On 2021-11-23 20:25, Arnaud Loonstra wrote:
On 23-11-2021 18:31, MRAB wrote:
On 2021-11-23 16:04, Arnaud Loonstra wrote:
On 23-11-2021 16:37, MRAB wrote:
On 2021-11-23 15:17, MRAB wrote:
On 2021-11-23 14:44, Arnaud Loonstra wrote:
On 23-11-2021 15:34, MRAB wrote:
On 2021-11-23 12:07, Arnaud Loonstra wrote:
Hi,

I've got Python embedded successfully in a program up until now as I'm now running into weird GC related segfaults. I'm currently trying to
debug this but my understanding of CPython limits me here.

I'm creating a Tuple in C but it crashes on creating it after a while.
It doesn't make sense which makes me wonder something else must be
happening? Could be it just crashes here because the GC is cleaning up stuff completely unrelated to the allocation of the new tuple? How can I
troubleshoot this?

I've got CPython compiled with  --with-valgrind --without-pymalloc
--with-pydebug

In C I'm creating a tuple with the following method:

static PyObject *
s_py_zosc_tuple(pythonactor_t *self, zosc_t *oscmsg)
{
      assert(self);
      assert(oscmsg);
      char *format = zosc_format(oscmsg);

      PyObject *rettuple = PyTuple_New((Py_ssize_t) strlen(format) );

It segfaults here (frame 16) after 320 times (consistently)


1   __GI_raise             raise.c          49   0x7ffff72c4e71
2   __GI_abort             abort.c          79   0x7ffff72ae536
3   fatal_error            pylifecycle.c    2183 0x7ffff7d84b4f
4   Py_FatalError          pylifecycle.c    2193 0x7ffff7d878b2
5   _PyObject_AssertFailed object.c         2200 0x7ffff7c93cf2
6   visit_decref           gcmodule.c       378  0x7ffff7dadfd5
7   tupletraverse          tupleobject.c    623  0x7ffff7ca3e81
8   subtract_refs          gcmodule.c       406  0x7ffff7dad340
9   collect                gcmodule.c       1054 0x7ffff7dae838
10  collect_with_callback  gcmodule.c       1240 0x7ffff7daf17b
11  collect_generations    gcmodule.c       1262 0x7ffff7daf3f6
12  _PyObject_GC_Alloc     gcmodule.c       1977 0x7ffff7daf4f2
13  _PyObject_GC_Malloc    gcmodule.c       1987 0x7ffff7dafebc
14  _PyObject_GC_NewVar    gcmodule.c       2016 0x7ffff7daffa5
15  PyTuple_New            tupleobject.c    118  0x7ffff7ca4da7
16  s_py_zosc_tuple        pythonactor.c    366  0x55555568cc82
17  pythonactor_socket     pythonactor.c    664  0x55555568dac7
18  pythonactor_handle_msg pythonactor.c    862  0x55555568e472
19  pythonactor_handler    pythonactor.c    828  0x55555568e2e2
20  sphactor_actor_run     sphactor_actor.c 855  0x5555558cb268
... <More>

Any pointer really appreciated.

[snip]


Basically, yes, but I won't be surprised if it was due to too few INCREFs or too many DECREFs somewhere.

https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L286

Incidentally, in s_py_zosc_tuple, you're not doing "assert(rc == 0);" after "after zosc_pop_float" or "zosc_pop_double".

Thanks for those pointers! I think your intuition is right. I might have
found the bugger. In s_py_zosc I call Py_DECREF on pAddress and pData.
However they are acquired by PyTuple_GetItem which returns a borrowed
reference. I think pAddress and pData are then also 'decrefed' when the
pReturn tuple which contains pAddress and pData is 'decrefed'?

Yes, members of a container are DECREFed when the container is destroyed.

It's bad practice for a function to DECREF its arguments unless the function's sole purpose is cleanup because the function won't know where the arguments came from.


I'm finding it out now. What strikes me was how hard it was to debug
this. I think it was caused because I INCREFed the return object. I
guess I did that to workaround the wrong DECREF data in the return
object. However that caused a hell to debug. I'm really curious what the
best practices are for debugging embedded CPython.

Thanks big time for your feedback!

What I do when writing the code is add comments showing what variables refer to an object at that point in the code, each suffixed with "+" if it owns a reference and/or "?" if it could be NULL.

Example 1:

    //>
    PyObject *my_tuple = PyTuple_New(count);
    //> my_tuple+?
    if (!my_tuple)
         goto error;
    //> my_tuple+

"//>" means that there are no variables that point to an object.

"//> my_tuple+?" means that "my_tuple" points to an object and it owns a reference, but it might be NULL.

"//> my_tuple+" means that "my_tuple" points to an object and it owns a reference.

Example 2:

    //>
    PyObject *my_item = PyList_New(my_list, index);
    //> my_tuple?
    if (!my_tuple)
         goto error;
    //> my_tuple

"//>" means that there are no variables that point to an object.

"//> my_tuple?" means that "my_tuple" points to an object, but it might be NULL.

"//> my_tuple" means that "my_tuple" points to an object.

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to