Am 08.12.2020 um 16:33 hat Vladimir Sementsov-Ogievskiy geschrieben: > 03.12.2020 20:23, Kevin Wolf wrote: > > If bdrv_co_yield_to_drain() is called for draining a block node that > > runs in a different AioContext, it keeps that AioContext locked while it > > yields and schedules a BH in the AioContext to do the actual drain. > > > > As long as executing the BH is the very next thing the event loop of the > > s/thing the event/thing in the event/ > > (I've reread several times to understand :)
Oops, thanks. "...the next thing that the event loop _does_" is actually what I had in mind. > > node's AioContext, this actually happens to work, but when it tries to > > execute something else that wants to take the AioContext lock, it will > > deadlock. (In the bug report, this other thing is a virtio-scsi device > > running virtio_scsi_data_plane_handle_cmd().) > > > > Instead, always drop the AioContext lock across the yield and reacquire > > it only when the coroutine is reentered. The BH needs to unconditionally > > take the lock for itself now. > > > > This fixes the 'block_resize' QMP command on a block node that runs in > > an iothread. > > > > Cc: qemu-sta...@nongnu.org > > Fixes: eb94b81a94bce112e6b206df846c1551aaf6cab6 > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511 > > Signed-off-by: Kevin Wolf <kw...@redhat.com> > > I don't feel myself good enough in aio contexts acquiring and > switching, to see any side effects. At least I don't see any obvious > mistakes, so my weak: > > Reviewed-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> > Note, I looked through the callers: > > bdrv_do_drained_begin/end should be ok, as their normal usage is to > start/end drained section under acquired aio context, so it seems > correct to temporary release the context. Still I didn't check all > drained sections in the code. > > bdrv_drain_all_begin seems OK too (we just wait until everything is > drained, not bad to temporary release the lock). Still I don't see any > call of it from coroutine context. The good thing there is that BDRV_POLL_WHILE() drops the lock anyway, so at least for all callers of bdrv_do_drained_begin() that pass poll=true, we know that they are fine with releasing the lock temporarily. There are two callers that pass false: The recursive call inside the function itself, and bdrv_drain_all_begin(). We know that both will poll later, so they always release the lock at least once. For ending the drain section, there is bdrv_drained_end_no_poll(), which is only called in bdrv_child_cb_drained_end(), i.e. an implementation of BdrvChildClass.drained_end. This is only called recursively in the context of a polling drain_end, which already drops the lock. So I think we don't introduce any cases of dropping the lock where this wouldn't have happened before. Kevin