-----Original Message----- From: Daniel P. Berrangé <berra...@redhat.com> Sent: Wednesday, December 1, 2021 5:11 PM To: Rao, Lei <lei....@intel.com> Cc: Zhang, Chen <chen.zh...@intel.com>; ebl...@redhat.com; vsement...@virtuozzo.com; kw...@redhat.com; hre...@redhat.com; qemu-bl...@nongnu.org; qemu-devel@nongnu.org Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote: > We found that the QIO channel coroutine could not be awakened in some > corner cases during our stress test for COLO. > The patch fixes as follow: > #0 0x00007fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, > timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 > #1 0x00005563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, > timeout=599999697630) at util/qemu-timer.c:348 > #2 0x00005563d697ac44 in aio_poll (ctx=0x5563d755dfa0, > blocking=true) at util/aio-posix.c:669 > #3 0x00005563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, > recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at > block/io.c:432 > #4 0x00005563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at > block/io.c:438 > #5 0x00005563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, > child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063 > #6 0x00005563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, > child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568 > #7 0x00005563d658499e in qmp_x_blockdev_change > (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 > "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494 > #8 0x00005563d67e8b4e in qmp_marshal_x_blockdev_change > (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at > qapi/qapi-commands-block-core.c:1538 > #9 0x00005563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 > <qmp_commands>, request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98) > at qapi/qmp-dispatch.c:132 > #10 0x00005563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 > <qmp_commands>, request=0x7fad64009d80, allow_oob=true) at > qapi/qmp-dispatch.c:175 > #11 0x00005563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, > req=0x7fad64009d80) at monitor/qmp.c:145 > #12 0x00005563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at > monitor/qmp.c:234 > #13 0x00005563d69754c2 in aio_bh_call (bh=0x5563d7473230) at > util/async.c:89 > #14 0x00005563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at > util/async.c:117 > #15 0x00005563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at > util/aio-posix.c:459 > #16 0x00005563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, > callback=0x0, user_data=0x0) at util/async.c:260 > #17 0x00007fad730e1fbd in g_main_context_dispatch () from > /lib/x86_64-linux-gnu/libglib-2.0.so.0 > #18 0x00005563d6978cda in glib_pollfds_poll () at util/main-loop.c:219 > #19 0x00005563d6978d58 in os_host_main_loop_wait (timeout=977650) at > util/main-loop.c:242 > #20 0x00005563d6978e69 in main_loop_wait (nonblocking=0) at > util/main-loop.c:518 > #21 0x00005563d658f5ed in main_loop () at vl.c:1814 > #22 0x00005563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8, > envp=0x7fff3cf5c2b8) at vl.c:450 > > From the call trace, we can see that the QEMU main thread is waiting for > the in_flight return to zero. But the in_filght never reaches 0. > After analysis, the root cause is that the coroutine of NBD was not > awakened. Although this bug was found in the COLO test, I think this is a > universal problem in the QIO module. This issue also affects other > modules depending on QIO such as NBD. We dump the following data: > $2 = { > in_flight = 2, > state = NBD_CLIENT_QUIT, > connect_status = 0, > connect_err = 0x0, > wait_in_flight = false, > requests = {{ > coroutine = 0x0, > offset = 273077071872, > receiving = false, > }, { > coroutine = 0x7f1164571df0, > offset = 498792529920, > receiving = false, > }, { > coroutine = 0x0, > offset = 500663590912, > receiving = false, > }, { > ... > } }, > reply = { > simple = { > magic = 1732535960, > error = 0, > handle = 0 > }, > structured = { > magic = 1732535960, > flags = 0, > type = 0, > handle = 0, > length = 0 > }, > { > magic = 1732535960, > _skip = 0, > handle = 0 > } > }, > bs = 0x7f11640e2140, > reconnect_delay = 0, > saddr = 0x7f11640e1f80, > export = 0x7f11640dc560 "parent0", > } > From the data, we can see the coroutine of NBD does not exit normally > when killing the NBD server(we kill the Secondary VM in the COLO stress test). > Then it will not execute in_flight--. So, the QEMU main thread will hang > here. Further analysis, I found the coroutine of NBD will yield > in nbd_send_request() or qio_channel_write_all() in nbd_co_send_request. > By the way, the yield is due to the kernel return EAGAIN(under the stress > test). > However, NBD didn't know it would yield here. So, the > nbd_recv_coroutines_wake won't wake it up because req->receiving is false. if > the NBD server > is terminated abnormally at the same time. The G_IO_OUT will be invalid, > the coroutine will never be awakened. In addition, the s->ioc will be released > immediately. if we force to wake up the coroutine of NBD, access to the > ioc->xxx will cause segment fault. Finally, I add a state named force_quit to > the QIOChannel to ensure that QIO will exit immediately. And I add the > function of qio_channel_coroutines_wake to wake it up. > > Signed-off-by: Lei Rao <lei....@intel.com> > Signed-off-by: Zhang Chen <chen.zh...@intel.com> > --- > block/nbd.c | 5 +++++ > include/io/channel.h | 19 +++++++++++++++++++ > io/channel.c | 22 ++++++++++++++++++++++ > 3 files changed, 46 insertions(+) > > diff --git a/block/nbd.c b/block/nbd.c index 5ef462db1b..5ee4eaaf57 > 100644 > --- a/block/nbd.c > +++ b/block/nbd.c > @@ -208,6 +208,8 @@ static void nbd_teardown_connection(BlockDriverState *bs) > assert(!s->in_flight); > > if (s->ioc) { > + qio_channel_set_force_quit(s->ioc, true); > + qio_channel_coroutines_wake(s->ioc); > qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, > NULL); Calling shutdown here should already be causing the qio_chanel_readv_all to wakeup and break out of its poll() loop. We shouldn't need to add a second way to break out of the poll(). Calling shutdown can wake up the coroutines which is polling. But I think it's not enough. I tried to forcibly wake up the NBD coroutine, It may cause segment fault. The root cause is that it will continue to access ioc->xxx in qio_channel_yield(), but the ioc has been released and set it NULL such as in nbd_co_do_establish_connection(); I think call shutdown will have the same result. So, I add the force_quit, once set it true, it will immediately exit without accessing IOC. Thanks Lei > yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), > nbd_yank, s->bs); @@ -315,6 +317,7 > @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState > *bs, > > /* successfully connected */ > s->state = NBD_CLIENT_CONNECTED; > + qio_channel_set_force_quit(s->ioc, false); > qemu_co_queue_restart_all(&s->free_sema); > > return 0; > @@ -345,6 +348,8 @@ static coroutine_fn void > nbd_reconnect_attempt(BDRVNBDState *s) > > /* Finalize previous connection if any */ > if (s->ioc) { > + qio_channel_set_force_quit(s->ioc, true); > + qio_channel_coroutines_wake(s->ioc); Isn't this code path just missing a qio_channel_shutdown call or a qio_channel_close call to make the socket trigger wakeup from poll. I don't think it can solve the bug even if it is added, the reason is as above. Thanks, Lei > qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc)); > yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), > nbd_yank, s->bs); diff --git > a/include/io/channel.h b/include/io/channel.h index > 88988979f8..bc5af8cdd6 100644 > --- a/include/io/channel.h > +++ b/include/io/channel.h > @@ -78,6 +78,7 @@ struct QIOChannel { > AioContext *ctx; > Coroutine *read_coroutine; > Coroutine *write_coroutine; > + bool force_quit; > #ifdef _WIN32 > HANDLE event; /* For use with GSource on Win32 */ #endif @@ > -484,6 +485,24 @@ int qio_channel_set_blocking(QIOChannel *ioc, > bool enabled, > Error **errp); > > +/** > + * qio_channel_force_quit: > + * @ioc: the channel object > + * @quit: the new flag state > + * > + * Set the new flag state > + */ > +void qio_channel_set_force_quit(QIOChannel *ioc, bool quit); > + > +/** > + * qio_channel_coroutines_wake: > + * @ioc: the channel object > + * > + * Wake up the coroutines to ensure that they will exit normally > + * when the server terminated abnormally */ void > +qio_channel_coroutines_wake(QIOChannel *ioc); > + > /** > * qio_channel_close: > * @ioc: the channel object > diff --git a/io/channel.c b/io/channel.c index e8b019dc36..bc1a9e055c > 100644 > --- a/io/channel.c > +++ b/io/channel.c > @@ -136,6 +136,9 @@ int qio_channel_readv_full_all_eof(QIOChannel *ioc, > if (len == QIO_CHANNEL_ERR_BLOCK) { > if (qemu_in_coroutine()) { > qio_channel_yield(ioc, G_IO_IN); > + if (ioc->force_quit) { > + goto cleanup; > + } > } else { > qio_channel_wait(ioc, G_IO_IN); > } > @@ -242,6 +245,9 @@ int qio_channel_writev_full_all(QIOChannel *ioc, > if (len == QIO_CHANNEL_ERR_BLOCK) { > if (qemu_in_coroutine()) { > qio_channel_yield(ioc, G_IO_OUT); > + if (ioc->force_quit) { > + goto cleanup; > + } > } else { > qio_channel_wait(ioc, G_IO_OUT); > } > @@ -543,6 +549,9 @@ void coroutine_fn qio_channel_yield(QIOChannel *ioc, > } > qio_channel_set_aio_fd_handlers(ioc); > qemu_coroutine_yield(); > + if (ioc->force_quit) { > + return; > + } > > /* Allow interrupting the operation by reentering the coroutine other > than > * through the aio_fd_handlers. */ @@ -555,6 +564,19 @@ void > coroutine_fn qio_channel_yield(QIOChannel *ioc, > } > } > > +void qio_channel_set_force_quit(QIOChannel *ioc, bool quit) { > + ioc->force_quit = quit; > +} > + > +void qio_channel_coroutines_wake(QIOChannel *ioc) { > + if (ioc->read_coroutine) { > + qio_channel_restart_read(ioc); > + } else if (ioc->write_coroutine) { > + qio_channel_restart_write(ioc); > + } > +} None of this should be needed AFAICT, as the poll/coroutine code shuld already break out of poll if the socket is closed, or is marked shutdown. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|