On 25/02/2025 18.44, Thomas Huth wrote:
On 25/02/2025 11.12, Kevin Wolf wrote:
Am 25.02.2025 um 08:20 hat Thomas Huth geschrieben:
Hi!
I'm facing a weird hang in iotest 233 on my Fedora 41 laptop. When running
./check -raw 233
the test simply hangs. Looking at the log, the last message is "== check
plain client to TLS server fails ==". I added some debug messages, and it
seems like the previous NBD server is not correctly terminated here.
The test works fine again if I apply this patch:
diff --git a/tests/qemu-iotests/common.nbd b/tests/qemu-iotests/common.nbd
--- a/tests/qemu-iotests/common.nbd
+++ b/tests/qemu-iotests/common.nbd
@@ -35,7 +35,7 @@ nbd_server_stop()
read NBD_PID < "$nbd_pid_file"
rm -f "$nbd_pid_file"
if [ -n "$NBD_PID" ]; then
- kill "$NBD_PID"
+ kill -9 "$NBD_PID"
fi
fi
rm -f "$nbd_unix_socket" "$nbd_stderr_fifo"
... but that does not look like the right solution to me. What could prevent
the qemu-nbd from correctly shutting down when it receives a normal SIGTERM
signal?
Not sure. In theory, qemu_system_killed() should set state = TERMINATE
and make main_loop_wait() return through the notification, which should
then make it shut down. Maybe you can attach gdb and check what 'state'
is when it hangs and if it's still in the main loop?
I attached a gdb and ran "bt", and it looks like it is hanging in an exit()
handler:
(gdb) bt
#0 0x00007f127f8fff1d in syscall () from /lib64/libc.so.6
#1 0x00007f127fd32e1d in g_cond_wait () from /lib64/libglib-2.0.so.0
#2 0x00005583df3048b2 in flush_trace_file (wait=true) at ../../devel/qemu/
trace/simple.c:140
#3 st_flush_trace_buffer () at ../../devel/qemu/trace/simple.c:383
#4 0x00007f127f8296c1 in __run_exit_handlers () from /lib64/libc.so.6
#5 0x00007f127f82978e in exit () from /lib64/libc.so.6
#6 0x00005583df1ae9e1 in main (argc=<optimized out>, argv=<optimized out>)
at ../../devel/qemu/qemu-nbd.c:1242
Ah, now that I wrote that: I recently ran "configure" with
--enable-trace-backends=simple ... when I remove that from "config.status"
again, then the test works fine again 8-)
Still, I think it should not hang with the simple trace backend here, should it?
Thomas