On Fri, 6 Jan 2023 at 15:16, Stefan Berger <stef...@linux.ibm.com> wrote: > > > > On 1/6/23 07:10, Peter Maydell wrote: > > I'm seeing an intermittent hang on the s390 CI runner in the > > bios-tables-test test. It looks like we've deadlocked because: > > > > * the TPM device is waiting for data on its socket that never arrives, > > and it's holding the iothread lock > > * QEMU is therefore not making forward progress; > > in particular it is unable to handle qtest queries/responses > > * the test binary thread 1 is waiting to get a response to its > > qtest command, which is not going to arrive > > * test binary thread 3 (tpm_emu_ctrl_thread) is has hit an > > assertion and is trying to kill QEMU via qtest_kill_qemu() > > * qtest_kill_qemu() is only a "SIGTERM and wait", so will wait > > forever, because QEMU won't respond to the SIGTERM while it's > > blocked waiting for the TPM device to release the iothread lock > > * because the ctrl-thread is waiting for QEMU to exit, it's never > > going to send the data that would unblock the TPM device emulation > > > [...] > > > > > Thread 3 (Thread 0x3ff8dafe900 (LWP 2661316)): > > #0 0x000003ff8e9c6002 in __GI___wait4 (pid=<optimized out>, > > stat_loc=stat_loc@entry=0x2aa0b42c9bc, options=<optimized out>, > > usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27 > > #1 0x000003ff8e9c5f72 in __GI___waitpid (pid=<optimized out>, > > stat_loc=stat_loc@entry=0x2aa0b42c9bc, options=options@entry=0) at > > waitpid.c:38 > > #2 0x000002aa0952a516 in qtest_wait_qemu (s=0x2aa0b42c9b0) at > > ../tests/qtest/libqtest.c:206 > > #3 0x000002aa0952a58a in qtest_kill_qemu (s=0x2aa0b42c9b0) at > > ../tests/qtest/libqtest.c:229 > > #4 0x000003ff8f0c288e in g_hook_list_invoke () from > > /lib/s390x-linux-gnu/libglib-2.0.so.0 > > #5 <signal handler called> > > #6 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > > #7 0x000003ff8e9240a2 in __GI_abort () at abort.c:79 > > #8 0x000003ff8f0feda8 in g_assertion_message () from > > /lib/s390x-linux-gnu/libglib-2.0.so.0 > > #9 0x000003ff8f0fedfe in g_assertion_message_expr () from > > /lib/s390x-linux-gnu/libglib-2.0.so.0 > > #10 0x000002aa09522904 in tpm_emu_ctrl_thread (data=0x3fff5ffa160) at > > ../tests/qtest/tpm-emu.c:189 > > This here seems to be the root cause. An unknown control channel command > was received from the TPM emulator backend by the control channel thread > and we end up in g_assert_not_reached().
Yeah. It would be good if we didn't deadlock without printing the assertion, though... I guess we could improve qtest_kill_qemu() so it doesn't wait indefinitely for QEMU to exit but instead sends a SIGKILL 20 seconds after the SIGTERM. (Annoyingly, there is no convenient "waitpid but with a timeout" function...) thanks -- PMM