From: Andrey Drobyshev <andrey.drobys...@virtuozzo.com> There's a race in monitor cleanup code which might result into SEGFAULT. When monitor_cleanup() is launched, qmp_dispatcher_co coroutine pointer is set to NULL (see Paolo's commit 3e6bed61 ("monitor: cleanup detection of qmp_dispatcher_co shutting down")). If after that we manage to send another QMP command while monitor is shutting down, we might end up getting SIGSEGV on aio_co_wake(NULL).
Backtrace: > #0 0x00005565359b7e19 in aio_co_wake (co=0x0) at ../util/async.c:693 > #1 0x0000556535814903 in qmp_dispatcher_co_wake () at ../monitor/qmp.c:361 > #2 0x0000556535814c17 in handle_qmp_command (opaque=0x55653767bb10, > req=0x7fd574007090, err=0x0) > at ../monitor/qmp.c:426 > #3 0x000055653598d8f1 in json_message_process_token > (lexer=0x55653767bbd8, input=0x7fd5740055a0, type=JSON_RCURLY, x=61, y=0) > at ../qobject/json-streamer.c:99 > #4 0x00005565359dd68b in json_lexer_feed_char (lexer=0x55653767bbd8, ch=125 > '}', flush=false) > at ../qobject/json-lexer.c:313 > #5 0x00005565359dd7f7 in json_lexer_feed (lexer=0x55653767bbd8, > buffer=0x7fd57ecf4e50 "}", size=1) > at ../qobject/json-lexer.c:350 > #6 0x000055653598d9e3 in json_message_parser_feed (parser=0x55653767bbc0, > buffer=0x7fd57ecf4e50 "}", size=1) > at ../qobject/json-streamer.c:121 > #7 0x0000556535814c72 in monitor_qmp_read (opaque=0x55653767bb10, > buf=0x7fd57ecf4e50 "}", size=1) > at ../monitor/qmp.c:433 > #8 0x000055653580b4ca in qemu_chr_be_write_impl (s=0x5565374466a0, > buf=0x7fd57ecf4e50 "}", len=1) > at ../chardev/char.c:214 > #9 0x000055653580b53b in qemu_chr_be_write (s=0x5565374466a0, > buf=0x7fd57ecf4e50 "}", len=1) at ../chardev/char.c:226 > #10 0x0000556535806aea in tcp_chr_read (chan=0x7fd574001600, cond=G_IO_IN, > opaque=0x5565374466a0) > at ../chardev/char-socket.c:520 > #11 0x00005565356d7f6e in qio_channel_fd_source_dispatch > (source=0x7fd574005780, callback=0x55653580696f <tcp_chr_read>, > user_data=0x5565374466a0) > at ../io/channel-watch.c:84 > #12 0x00007fd590ae6f4f in g_main_dispatch (context=0x556537671f50) at > ../glib/gmain.c:3364 > #13 g_main_context_dispatch (context=0x556537671f50) at ../glib/gmain.c:4079 > #14 0x00007fd590b3c268 in g_main_context_iterate.constprop.0 > (context=0x556537671f50, block=block@entry=1, dispatch=dispatch@entry=1, > self=<optimized out>) > at ../glib/gmain.c:4155 > #15 0x00007fd590ae65a3 in g_main_loop_run (loop=0x556537672070) at > ../glib/gmain.c:4353 > #16 0x000055653570dbb0 in iothread_run (opaque=0x556537345660) at > ../iothread.c:70 > #17 0x000055653599baf6 in qemu_thread_start (args=0x556537672090) at > ../util/qemu-thread-posix.c:541 > #18 0x00007fd58f49f7f2 in start_thread (arg=<optimized out>) at > pthread_create.c:443 > #19 0x00007fd58f43f450 in clone3 () at > ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 Unfortunately I couldn't reproduce the issue cleanly as the race window is quite small. If I add sleep as follows, extending the race window and imitating the thread being preempted by the kernel scheduler: > diff --git a/monitor/qmp.c b/monitor/qmp.c > index cb99a12d94..dd608ee717 100644 > --- a/monitor/qmp.c > +++ b/monitor/qmp.c > @@ -349,6 +349,7 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data) > qmp_request_free(req_obj); > } > qatomic_set(&qmp_dispatcher_co, NULL); > + sleep(1); > } > > void qmp_dispatcher_co_wake(void) Then the SEGFAULT can be reproduced like that: > SRCDIR=/path/to/srcdir > QMPSHELL=$SRCDIR/scripts/qmp/qmp-shell > QMPSOCK=/path/to/qmpsock > > $QMPSHELL -p $QMPSOCK <<EOF > system_powerdown > EOF > > while /bin/true ; do > $QMPSHELL -p $QMPSOCK <<EOF > query-status > EOF > done Andrey Drobyshev (1): monitor: don't wake up qmp_dispatcher_co coroutine upon cleanup monitor/qmp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- 2.43.5