On Thu, Aug 17, 2023 at 02:50:04PM +0200, Kevin Wolf wrote:
> bdrv_unref() is called by a lot of places that need to hold the graph
> lock (it naturally happens in the context of operations that change the
> graph). However, bdrv_unref() takes the graph writer lock internally, so
> it can't actually be called while already holding a graph lock without
> causing a deadlock.
> 
> bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
> node before closing it, and draining requires that the graph is
> unlocked.
> 
> The solution is to defer deleting the node until we don't hold the lock
> any more and draining is possible again.
> 
> Note that keeping images open for longer than necessary can create
> problems, too: You can't open an image again before it is really closed
> (if image locking didn't prevent it, it would cause corruption).
> Reopening an image immediately happens at least during bdrv_open() and
> bdrv_co_create().
> 
> In order to solve this problem, make sure to run the deferred unref in
> bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
> again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.
> 
> The output of iotest 051 is updated because the additional polling
> changes the order of HMP output, resulting in a new "(qemu)" prompt in
> the test output that was previously on a separate line and filtered out.
> 
> Signed-off-by: Kevin Wolf <kw...@redhat.com>
> ---
> +++ b/block/graph-lock.c
> @@ -163,17 +163,26 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
>  void bdrv_graph_wrunlock(void)
>  {
>      GLOBAL_STATE_CODE();
> -    QEMU_LOCK_GUARD(&aio_context_list_lock);
>      assert(qatomic_read(&has_writer));
>  
> +    WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
> +        /*
> +         * No need for memory barriers, this works in pair with
> +         * the slow path of rdlock() and both take the lock.
> +         */
> +        qatomic_store_release(&has_writer, 0);
> +
> +        /* Wake up all coroutine that are waiting to read the graph */
> +        qemu_co_enter_all(&reader_queue, &aio_context_list_lock);

So if I understand coroutines correctly, this says all pending
coroutines are now scheduled to run (or maybe they do try to run here,
but then immediately return control back to this coroutine to await
the right lock conditions since we are still in the block guarded by
list_lock)...

> +    }
> +
>      /*
> -     * No need for memory barriers, this works in pair with
> -     * the slow path of rdlock() and both take the lock.
> +     * Run any BHs that were scheduled during the wrlock section and that
> +     * callers might expect to have finished (e.g. bdrv_unref() calls). Do 
> this
> +     * only after restarting coroutines so that nested event loops in BHs 
> don't
> +     * deadlock if their condition relies on the coroutine making progress.
>       */
> -    qatomic_store_release(&has_writer, 0);
> -
> -    /* Wake up all coroutine that are waiting to read the graph */
> -    qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> +    aio_bh_poll(qemu_get_aio_context());

...and as long as the other coroutines sharing this thread don't
actually get to make progress until the next point at which the
current coroutine yields, and as long as our aio_bh_poll() doesn't
also include a yield point, then we are ensured that the BH has
completed before the next yield point in our caller.

There are times (like today) where I'm still trying to wrap my mind
about the subtle differences between true multi-threading
vs. cooperative coroutines sharing a single thread via the use of
yield points.  coroutines are cool when they can get rid of some of
the races that you have to worry about in true multi-threading.

Reviewed-by: Eric Blake <ebl...@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


Reply via email to