Control: forwarded -1 https://github.com/flatpak/libportal/issues/169
On Sat, 08 Feb 2025 at 23:12:35 +0000, Simon McVittie wrote: > Did you mention that it was possible to set up remote access to one of > these machines, and would I be able to install systemd-coredump there? > That would at least help to clarify whether my rare crash is the same > thing as your more frequent crash. On the VM that Santiago provided, I successfully built libportal from source twice in a row (by entering the sid schroot and building with dpkg-buildpackage), which seems inconsistent with Santiago getting a less than 10% success rate. Running the tests manually with `meson test -C obj-x86_64-linux-gnu --repeat=20`, it failed on the first or second loop iteration the first few times, but another attempt has now passed its first 6 iterations and is still going. So there still seems to be quite a significant difference between the reproducibility of failure that Santiago saw, and what I'm now seeing. Perhaps the difference of whether test output is being written to an interactive terminal or to a log file perturbs the timing enough that a race condition is consistently won or lost differently? The backtrace for failures looks a lot like the use-after-free that I previously reported to https://github.com/flatpak/libportal/issues/169, which I was previously able to reproduce very rarely (by running the tests in a loop) but unable to debug (adding debug logging changed the timing enough that the bug stopped happening). Hopefully the timing of this test on the AWS instance is sufficiently different that it might be more feasible to debug it. My analysis so far is that this is most likely to be a bug in the libportal library successfully being detected by the regression tests, rather than a flaw in the tests (it appears to be a use-after-free, which is something that shouldn't be possible when using the library correctly). I don't know why this happens somewhat frequently on this specific type of VM instance, but so rarely elsewhere (and in particular not on any of the official buildds or when I try to debug it locally). Unfortunately, the fact that this is randomly succeeding or failing is going to make it difficult to validate attempts to fix it, because if it succeeds after a code change, that won't tell us whether it is genuinely fixed or whether we just got a different roll of the dice. smcv #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=11, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44 #1 0x00007fc292949e2f in __pthread_kill_internal (threadid=<optimized out>, signo=11) at ./nptl/pthread_kill.c:78 #2 0x00007fc2928f5d02 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26 #3 <signal handler called> #4 0x00007fc2929583ea in __GI___libc_free (mem=0xffffffffffffffff) at ./malloc/malloc.c:3375 #5 0x00007fc290e5a289 in g_free (mem=<optimized out>) at ../../../glib/gmem.c:208 #6 0x00007fc290653041 in call_free (call=call@entry=0x2954c010) at ../libportal/inputcapture.c:259 #7 0x00007fc290653123 in call_returned (object=<optimized out>, result=<optimized out>, data=0x2954c010) at ../libportal/inputcapture.c:288 #8 0x00007fc290832603 in g_task_return_now (task=task@entry=0x294fc9b0) at ../../../gio/gtask.c:1363 #9 0x00007fc2908332a3 in g_task_return (type=<optimized out>, task=0x294fc9b0) at ../../../gio/gtask.c:1432 #10 g_task_return (task=0x294fc9b0, type=<optimized out>) at ../../../gio/gtask.c:1389 #11 0x00007fc2908911f0 in g_dbus_connection_call_done (source=0x29525f80, result=<optimized out>, user_data=0x294fc9b0) at ../../../gio/gdbusconnection.c:6355 #12 0x00007fc290832603 in g_task_return_now (task=task@entry=0x29516490) at ../../../gio/gtask.c:1363 #13 0x00007fc29083263d in complete_in_idle_cb (task=0x29516490) at ../../../gio/gtask.c:1377 #14 0x00007fc290e50d5f in g_main_dispatch (context=context@entry=0x290bc530) at ../../../glib/gmain.c:3361 #15 0x00007fc290e52fd7 in g_main_context_dispatch_unlocked (context=0x290bc530) at ../../../glib/gmain.c:4212 #16 g_main_context_iterate_unlocked (context=0x290bc530, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:4277 #17 0x00007fc290e53a3f in g_main_loop_run (loop=0x294a7050) at ../../../glib/gmain.c:4479