From: Andrey Drobyshev <andrey.drobys...@virtuozzo.com>

There's a race in monitor cleanup code which might result into SEGFAULT.
When monitor_cleanup() is launched, qmp_dispatcher_co coroutine pointer
is set to NULL (see Paolo's commit 3e6bed61 ("monitor: cleanup detection of
qmp_dispatcher_co shutting down")).  If after that we manage to send
another QMP command while monitor is shutting down, we might end up
getting SIGSEGV on aio_co_wake(NULL).

Backtrace:

> #0  0x00005565359b7e19 in aio_co_wake (co=0x0) at ../util/async.c:693
> #1  0x0000556535814903 in qmp_dispatcher_co_wake () at ../monitor/qmp.c:361
> #2  0x0000556535814c17 in handle_qmp_command (opaque=0x55653767bb10, 
> req=0x7fd574007090, err=0x0)
>     at ../monitor/qmp.c:426
> #3  0x000055653598d8f1 in json_message_process_token
>     (lexer=0x55653767bbd8, input=0x7fd5740055a0, type=JSON_RCURLY, x=61, y=0) 
> at ../qobject/json-streamer.c:99
> #4  0x00005565359dd68b in json_lexer_feed_char (lexer=0x55653767bbd8, ch=125 
> '}', flush=false)
>     at ../qobject/json-lexer.c:313
> #5  0x00005565359dd7f7 in json_lexer_feed (lexer=0x55653767bbd8, 
> buffer=0x7fd57ecf4e50 "}", size=1)
>     at ../qobject/json-lexer.c:350
> #6  0x000055653598d9e3 in json_message_parser_feed (parser=0x55653767bbc0, 
> buffer=0x7fd57ecf4e50 "}", size=1)
>     at ../qobject/json-streamer.c:121
> #7  0x0000556535814c72 in monitor_qmp_read (opaque=0x55653767bb10, 
> buf=0x7fd57ecf4e50 "}", size=1)
>     at ../monitor/qmp.c:433
> #8  0x000055653580b4ca in qemu_chr_be_write_impl (s=0x5565374466a0, 
> buf=0x7fd57ecf4e50 "}", len=1)
>     at ../chardev/char.c:214
> #9  0x000055653580b53b in qemu_chr_be_write (s=0x5565374466a0, 
> buf=0x7fd57ecf4e50 "}", len=1) at ../chardev/char.c:226
> #10 0x0000556535806aea in tcp_chr_read (chan=0x7fd574001600, cond=G_IO_IN, 
> opaque=0x5565374466a0)
>     at ../chardev/char-socket.c:520
> #11 0x00005565356d7f6e in qio_channel_fd_source_dispatch
>     (source=0x7fd574005780, callback=0x55653580696f <tcp_chr_read>, 
> user_data=0x5565374466a0)
>     at ../io/channel-watch.c:84
> #12 0x00007fd590ae6f4f in g_main_dispatch (context=0x556537671f50) at 
> ../glib/gmain.c:3364
> #13 g_main_context_dispatch (context=0x556537671f50) at ../glib/gmain.c:4079
> #14 0x00007fd590b3c268 in g_main_context_iterate.constprop.0
>     (context=0x556537671f50, block=block@entry=1, dispatch=dispatch@entry=1, 
> self=<optimized out>)
>     at ../glib/gmain.c:4155
> #15 0x00007fd590ae65a3 in g_main_loop_run (loop=0x556537672070) at 
> ../glib/gmain.c:4353
> #16 0x000055653570dbb0 in iothread_run (opaque=0x556537345660) at 
> ../iothread.c:70
> #17 0x000055653599baf6 in qemu_thread_start (args=0x556537672090) at 
> ../util/qemu-thread-posix.c:541
> #18 0x00007fd58f49f7f2 in start_thread (arg=<optimized out>) at 
> pthread_create.c:443
> #19 0x00007fd58f43f450 in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Unfortunately I couldn't reproduce the issue cleanly as the race window
is quite small.  If I add sleep as follows, extending the race window
and imitating the thread being preempted by the kernel scheduler:

> diff --git a/monitor/qmp.c b/monitor/qmp.c
> index cb99a12d94..dd608ee717 100644
> --- a/monitor/qmp.c
> +++ b/monitor/qmp.c
> @@ -349,6 +349,7 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
>          qmp_request_free(req_obj);
>      }
>      qatomic_set(&qmp_dispatcher_co, NULL);
> +    sleep(1);
>  }
>  
>  void qmp_dispatcher_co_wake(void)

Then the SEGFAULT can be reproduced like that:

> SRCDIR=/path/to/srcdir
> QMPSHELL=$SRCDIR/scripts/qmp/qmp-shell
> QMPSOCK=/path/to/qmpsock
> 
> $QMPSHELL -p $QMPSOCK <<EOF
>     system_powerdown
> EOF
> 
> while /bin/true ; do
>     $QMPSHELL -p $QMPSOCK <<EOF
>         query-status
> EOF
> done

Andrey Drobyshev (1):
  monitor: don't wake up qmp_dispatcher_co coroutine upon cleanup

 monitor/qmp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
2.43.5


Reply via email to