Am 12.07.2019 um 13:44 hat Max Reitz geschrieben: > On 12.07.19 13:23, Kevin Wolf wrote: > > Am 12.07.2019 um 13:09 hat Max Reitz geschrieben: > >> On 12.07.19 13:01, Kevin Wolf wrote: > >>> Am 12.07.2019 um 12:47 hat Max Reitz geschrieben: > >>>> On 12.07.19 11:24, Kevin Wolf wrote: > >>>>> Am 11.07.2019 um 21:58 hat Max Reitz geschrieben: > >>>>>> When nbd_close() is called from a coroutine, the connection_co never > >>>>>> gets to run, and thus nbd_teardown_connection() hangs. > >>>>>> > >>>>>> This is because aio_co_enter() only puts the connection_co into the > >>>>>> main > >>>>>> coroutine's wake-up queue, so this main coroutine needs to yield and > >>>>>> reschedule itself to let the connection_co run. > >>>>>> > >>>>>> Signed-off-by: Max Reitz <mre...@redhat.com> > >>>>>> --- > >>>>>> block/nbd.c | 12 +++++++++++- > >>>>>> 1 file changed, 11 insertions(+), 1 deletion(-) > >>>>>> > >>>>>> diff --git a/block/nbd.c b/block/nbd.c > >>>>>> index 81edabbf35..b83b6cd43e 100644 > >>>>>> --- a/block/nbd.c > >>>>>> +++ b/block/nbd.c > >>>>>> @@ -135,7 +135,17 @@ static void > >>>>>> nbd_teardown_connection(BlockDriverState *bs) > >>>>>> qio_channel_shutdown(s->ioc, > >>>>>> QIO_CHANNEL_SHUTDOWN_BOTH, > >>>>>> NULL); > >>>>>> - BDRV_POLL_WHILE(bs, s->connection_co); > >>>>>> + > >>>>>> + if (qemu_in_coroutine()) { > >>>>>> + /* Let our caller poll and just yield until connection_co is > >>>>>> done */ > >>>>>> + while (s->connection_co) { > >>>>>> + aio_co_schedule(qemu_get_current_aio_context(), > >>>>>> + qemu_coroutine_self()); > >>>>>> + qemu_coroutine_yield(); > >>>>>> + } > >>>>> > >>>>> Isn't this busy waiting? Why not let s->connection_co wake us up when > >>>>> it's about to terminate instead of immediately rescheduling ourselves? > >>>> > >>>> Yes, it is busy waiting, but I didn’t find that bad. The connection_co > >>>> will be invoked in basically every iteration, and once there is no > >>>> pending data, it will quit. > >>>> > >>>> The answer to “why not...” of course is because it’d be more complicated. > >>>> > >>>> But anyway. > >>>> > >>>> Adding a new function qemu_coroutine_run_after(target) that adds > >>>> qemu_coroutine_self() to the given @target coroutine’s wake-up queue and > >>>> then using that instead of scheduling works, too, yes. > >>>> > >>>> I don’t really like being responsible for coroutine code, though... > >>>> > >>>> (And maybe it’d be better to make it qemu_coroutine_yield_for(target), > >>>> which does the above and then yields?) > >>> > >>> Or just do something like this, which is arguably not only a fix for the > >>> busy wait, but also a code simplification: > >> > >> 1. Is that guaranteed to work? What if data sneaks in, the > >> connection_co handles that, and doesn’t wake up the teardown_co? Will > >> it be re-scheduled? > > > > Then connection_co is buggy because we clearly requested that it > > terminate. > > Did we? This would be done by setting s->quit to true, which isn’t > explicitly done here.
*we clearly requested implicitly ;-) > I thought it worked by just waking up the coroutine until it doesn’t > receive anything anymore, because the connection is closed. Now I don’t > know whether QIO_CHANNEL_SHUTDOWN_BOTH discards data that has been > received before the channel is closed. I don’t expect it to. It doesn't really matter, but I expect that we'll still read everything that was buffered and receive EOF after everything is read. > > It is possible that it does so only after handling another > > request, but this wouldn't be a problem. teardown_co would then just > > sleep for a few cycles more until connection_co is done and reaches the > > aio_co_wake() call. > > I don’t quite understand, because the fact how connection_co would > proceed after handling another request was exactly my question. If it > were to yield and not to wake up, it would never be done. But why would it not wake up? This would be a bug, every yield needs a corresponding place from which the coroutine is reentered later. If this were missing, it would already today mean that we hang during shutdown because s->connection_co would never become NULL. > But I’ve followed nbd_receive_reply() now, and I suppose nbd_read_eof() > will simply never yield after we have invoked qio_channel_shutdown(). If my expectation above is right, this would probably be the case at least for the "main" yield. Not sure if there aren't other yield points, though. But as I said, it doesn't matter anyway how many times the coroutine yields and is reentered before finally reaching the aio_co_wake() and terminating. Kevin
signature.asc
Description: PGP signature