Control: retitle -1 libportal: FTBFS with segfault during TestInputCapture, reliably on AWS but rarely elsewhere
On Sat, 08 Feb 2025 at 18:13:59 +0100, Santiago Vila wrote: > El 8/2/25 a las 16:10, Simon McVittie escribió: > > > Now this is failing 100% of the time at least > > > on AWS instances of type m7a.large and r7a.large > > > (having 2 CPUs) > > > > Please share a log for the particular test failure you are experiencing > > here? It might be a recurrence of #1082570 (a segmentation fault in the > > Python process while running pytest) or it might be something different. > > What I experience is still a python segfault so I believe it belongs here. > The difference is that now it happens always instead of randomly. Looking at the logs, that does indeed seem to be the case. It seems to be consistently happening in your build environment, and consistently *not* happening on the official buildds, reproducible-builds.org, or my usual local builds. It also seems to be consistently happening during the same test class, TestInputCapture, and usually (12 out of 13 times) during test_session_create_no_zones_on_getzones. On one occasion that particular test passed and the segfault happened later, in test_zones_changed. I tried building libportal in a single-vCPU qemu VM and running the tests repeatedly in a loop (meson test --repeat=20). The first 36 repetitions all succeeded, but I did get a segfault eventually, which seems similar to <https://github.com/flatpak/libportal/issues/169> (backtrace below), which as far as I can tell is a rare use-after-free. Unfortunately I was unable to reproduce that one after adding more debug logging (the extra logging apparently perturbs the timing enough to avoid it). Like most of your test failures, the Python-level traceback says this was during TestInputCapture::test_session_create_no_zones_on_getzones. However, I'm surprised that this happens so often for you when it took me so many repetitions to see one crash - the fact that it's so reliable for you, and so rare for me, makes me wonder whether the crash I saw is even the same thing. Is there anything else unusual about these VMs, other than the relatively low CPU count, that might be resulting in failure modes that aren't usually seen? Did you mention that it was possible to set up remote access to one of these machines, and would I be able to install systemd-coredump there? That would at least help to clarify whether my rare crash is the same thing as your more frequent crash. For the moment I'll try to disable the relevant test without completely losing test coverage, so that we still have at least some evidence that the package is working as intended before shipping it. smcv #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=11, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44 #1 0x00007efdba665e2f in __pthread_kill_internal (threadid=<optimized out>, signo=11) at ./nptl/pthread_kill.c:78 #2 0x00007efdba611d02 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26 #3 0x00007efdba611da0 in <signal handler called> () at /lib/x86_64-linux-gnu/libc.so.6 #4 0x00007efdb83720a3 in call_free (call=call@entry=0x1d2b3780) at ../libportal/inputcapture.c:254 #5 0x00007efdb8372123 in call_returned (object=<optimized out>, result=<optimized out>, data=0x1d2b3780) at ../libportal/inputcapture.c:288 #6 0x00007efdb8551603 in g_task_return_now (task=task@entry=0x1d28ea60 [GTask]) at ../../../gio/gtask.c:1363 #7 0x00007efdb85522a3 in g_task_return (type=<optimized out>, task=0x1d28ea60 [GTask]) at ../../../gio/gtask.c:1432 #8 g_task_return (task=0x1d28ea60 [GTask], type=<optimized out>) at ../../../gio/gtask.c:1389 #9 0x00007efdb85b01f0 in g_dbus_connection_call_done (source=0x1d376b10 [GDBusConnection], result=<optimized out>, user_data=0x1d28ea60) at ../../../gio/gdbusconnection.c:6355 #10 0x00007efdb8551603 in g_task_return_now (task=task@entry=0x1d302b20 [GTask]) at ../../../gio/gtask.c:1363 #11 0x00007efdb855163d in complete_in_idle_cb (task=0x1d302b20) at ../../../gio/gtask.c:1377 #12 0x00007efdb8b88d5f in g_main_dispatch (context=context@entry=0x1cef60d0) at ../../../glib/gmain.c:3361 #13 0x00007efdb8b8afd7 in g_main_context_dispatch_unlocked (context=0x1cef60d0) at ../../../glib/gmain.c:4212 #14 g_main_context_iterate_unlocked (context=0x1cef60d0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:4277 #15 0x00007efdb8b8ba3f in g_main_loop_run (loop=0x1d2f29c0) at ../../../glib/gmain.c:4479 #16 0x00007efdb8a8b3fe in ??? () at /lib/x86_64-linux-gnu/libffi.so.8 #17 0x00007efdb8a8a70d in ??? () at /lib/x86_64-linux-gnu/libffi.so.8 #18 0x00007efdb8a8aee3 in ffi_call () at /lib/x86_64-linux-gnu/libffi.so.8 #19 0x00007efdb8cb74dc in ??? () at /usr/lib/python3/dist-packages/gi/_gi.cpython-313-x86_64-linux-gnu.so #20 0x00007efdb8cb9989 in ??? () at /usr/lib/python3/dist-packages/gi/_gi.cpython-313-x86_64-linux-gnu.so #21 0x00007efdb8ca5769 in ??? () at /usr/lib/python3/dist-packages/gi/_gi.cpython-313-x86_64-linux-gnu.so #22 0x0000000000540a43 in _PyObject_MakeTpCall () ... Python below this point ...