On Thu, Aug 17, 2023 at 02:50:04PM +0200, Kevin Wolf wrote: > bdrv_unref() is called by a lot of places that need to hold the graph > lock (it naturally happens in the context of operations that change the > graph). However, bdrv_unref() takes the graph writer lock internally, so > it can't actually be called while already holding a graph lock without > causing a deadlock. > > bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the > node before closing it, and draining requires that the graph is > unlocked. > > The solution is to defer deleting the node until we don't hold the lock > any more and draining is possible again. > > Note that keeping images open for longer than necessary can create > problems, too: You can't open an image again before it is really closed > (if image locking didn't prevent it, it would cause corruption). > Reopening an image immediately happens at least during bdrv_open() and > bdrv_co_create(). > > In order to solve this problem, make sure to run the deferred unref in > bdrv_graph_wrunlock(), i.e. the first possible place where we can drain > again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK. > > The output of iotest 051 is updated because the additional polling > changes the order of HMP output, resulting in a new "(qemu)" prompt in > the test output that was previously on a separate line and filtered out. > > Signed-off-by: Kevin Wolf <kw...@redhat.com> > --- > include/block/block-global-state.h | 1 + > block.c | 9 +++++++++ > block/graph-lock.c | 23 ++++++++++++++++------- > tests/qemu-iotests/051.pc.out | 6 +++--- > 4 files changed, 29 insertions(+), 10 deletions(-) > > diff --git a/include/block/block-global-state.h > b/include/block/block-global-state.h > index f347199bff..e570799f85 100644 > --- a/include/block/block-global-state.h > +++ b/include/block/block-global-state.h > @@ -224,6 +224,7 @@ void bdrv_img_create(const char *filename, const char > *fmt, > void bdrv_ref(BlockDriverState *bs); > void no_coroutine_fn bdrv_unref(BlockDriverState *bs); > void coroutine_fn no_co_wrapper bdrv_co_unref(BlockDriverState *bs); > +void GRAPH_WRLOCK bdrv_schedule_unref(BlockDriverState *bs); > void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child); > BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs, > BlockDriverState *child_bs, > diff --git a/block.c b/block.c > index 6376452768..9c4f24f4b9 100644 > --- a/block.c > +++ b/block.c > @@ -7033,6 +7033,15 @@ void bdrv_unref(BlockDriverState *bs) > } > } > > +void bdrv_schedule_unref(BlockDriverState *bs)
Please add a doc comment explaining when and why this should be used. > +{ > + if (!bs) { > + return; > + } > + aio_bh_schedule_oneshot(qemu_get_aio_context(), > + (QEMUBHFunc *) bdrv_unref, bs); > +} > + > struct BdrvOpBlocker { > Error *reason; > QLIST_ENTRY(BdrvOpBlocker) list; > diff --git a/block/graph-lock.c b/block/graph-lock.c > index 5e66f01ae8..395d387651 100644 > --- a/block/graph-lock.c > +++ b/block/graph-lock.c > @@ -163,17 +163,26 @@ void bdrv_graph_wrlock(BlockDriverState *bs) > void bdrv_graph_wrunlock(void) > { > GLOBAL_STATE_CODE(); > - QEMU_LOCK_GUARD(&aio_context_list_lock); > assert(qatomic_read(&has_writer)); > > + WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) { > + /* > + * No need for memory barriers, this works in pair with > + * the slow path of rdlock() and both take the lock. > + */ > + qatomic_store_release(&has_writer, 0); > + > + /* Wake up all coroutine that are waiting to read the graph */ s/coroutine/coroutines/ > + qemu_co_enter_all(&reader_queue, &aio_context_list_lock); > + } > + > /* > - * No need for memory barriers, this works in pair with > - * the slow path of rdlock() and both take the lock. > + * Run any BHs that were scheduled during the wrlock section and that > + * callers might expect to have finished (e.g. bdrv_unref() calls). Do > this Referring directly to bdrv_schedule_unref() would help make it clearer what you mean. > + * only after restarting coroutines so that nested event loops in BHs > don't > + * deadlock if their condition relies on the coroutine making progress. > */ > - qatomic_store_release(&has_writer, 0); > - > - /* Wake up all coroutine that are waiting to read the graph */ > - qemu_co_enter_all(&reader_queue, &aio_context_list_lock); > + aio_bh_poll(qemu_get_aio_context()); Keeping a dedicated list of BDSes pending unref seems cleaner than using the aio_bh_poll(). There's a lot of stuff happening in the event loop and I wonder if those other BHs might cause issues if they run at the end of bdrv_graph_wrunlock(). The change to qemu-iotests 051's output is an example of this, but other things could happen too (e.g. monitor command processing). > } > > void coroutine_fn bdrv_graph_co_rdlock(void) > diff --git a/tests/qemu-iotests/051.pc.out b/tests/qemu-iotests/051.pc.out > index 4d4af5a486..7e10c5fa1b 100644 > --- a/tests/qemu-iotests/051.pc.out > +++ b/tests/qemu-iotests/051.pc.out > @@ -169,11 +169,11 @@ QEMU_PROG: -device scsi-hd,drive=disk: Device needs > media, but drive is empty > > Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object > iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 > -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device > ide-hd,drive=disk,share-rw=on > QEMU X.Y.Z monitor - type 'help' for more information > -QEMU_PROG: -device ide-hd,drive=disk,share-rw=on: Cannot change iothread of > active block backend > +(qemu) QEMU_PROG: -device ide-hd,drive=disk,share-rw=on: Cannot change > iothread of active block backend > > Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object > iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 > -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device > virtio-blk-pci,drive=disk,share-rw=on > QEMU X.Y.Z monitor - type 'help' for more information > -QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: Cannot change > iothread of active block backend > +(qemu) QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: Cannot > change iothread of active block backend > > Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object > iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 > -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device > lsi53c895a,id=lsi0 -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on > QEMU X.Y.Z monitor - type 'help' for more information > @@ -185,7 +185,7 @@ QEMU X.Y.Z monitor - type 'help' for more information > > Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object > iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 > -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device > virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on > QEMU X.Y.Z monitor - type 'help' for more information > -QEMU_PROG: -device virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on: > Cannot change iothread of active block backend > +(qemu) QEMU_PROG: -device > virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on: Cannot change > iothread of active block backend > > Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object > iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 > -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device > virtio-scsi,id=virtio-scsi1,iothread=thread0 -device > scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on > QEMU X.Y.Z monitor - type 'help' for more information > -- > 2.41.0 >
signature.asc
Description: PGP signature