On Tue, Feb 08, 2022 at 10:36:54AM -0500, Emanuele Giuseppe Esposito wrote: > This test uses a callback of an I/O function (blk_aio_preadv) > to modify the graph, using bdrv_attach_child. > This is simply not allowed anymore. I/O cannot change the graph. > > Before "block/io.c: make bdrv_do_drained_begin_quiesce static > and introduce bdrv_drained_begin_no_poll", the test would simply > be at risk of failure, because if bdrv_replace_child_noperm() > (called to modify the graph) would call a drain, > then one callback of .drained_begin() is bdrv_do_drained_begin_quiesce, > that specifically asserts that we are not in a coroutine. > > Now that we fixed the behavior, the drain will invoke a bh in the > main loop, so we don't have such problem. However, this test is still > illegal and fails because we forbid graph changes from I/O paths. > > Once we add the required subtree_drains to protect > bdrv_replace_child_noperm(), the key problem in this test is in: > > acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL); > /* Drain and check the expected result */ > bdrv_subtree_drained_begin(parent_b); > > because the detach_by_parent_aio_cb calls detach_indirect_bh(), that > modifies the graph and is invoked during bdrv_subtree_drained_begin(). > The call stack is the following: > 1. blk_aio_preadv() creates a coroutine, increments in_flight counter > and enters the coroutine running blk_aio_read_entry() > 2. blk_aio_read_entry() performs the read and then schedules a bh to > complete (blk_aio_complete) > 3. at this point, subtree_drained_begin() kicks in and waits for all > in_flight requests, polling > 4. polling allows the bh to be scheduled, so blk_aio_complete runs > 5. blk_aio_complete *first* invokes the callback > (detach_by_parent_aio_cb) and then decrements the in_flight counter > 6. Here we have the problem: detach_by_parent_aio_cb modifies the graph, > so both bdrv_unref_child() and bdrv_attach_child() will have > subtree_drains inside. And this causes a deadlock, because the > nested drain will wait for in_flight counter to go to zero, which > is only happening once the drain itself finishes. > > Different story is test_detach_by_driver_cb(): in this case, > detach_by_parent_aio_cb() does not call detach_indirect_bh(), > but it is instead called as a bh running in the main loop by > detach_by_driver_cb_drained_begin(), the callback for > .drained_begin(). > > This test was added in 231281ab42 and part of the series > "Drain fixes and cleanups, part 3" > https://lists.nongnu.org/archive/html/qemu-block/2018-05/msg01132.html > but as explained above I believe that it is not valid anymore, and > can be discarded. > > Signed-off-by: Emanuele Giuseppe Esposito <eespo...@redhat.com> > --- > tests/unit/test-bdrv-drain.c | 46 +++++++++--------------------------- > 1 file changed, 11 insertions(+), 35 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
signature.asc
Description: PGP signature