Re: [Qemu-devel] aio context ownership during bdrv_close()

Kevin Wolf Mon, 06 May 2019 06:57:28 -0700

Am 06.05.2019 um 14:47 hat Anton Kuchin geschrieben:
> On 29/04/2019 13:38, Kevin Wolf wrote:
> > Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben:
> > > I can't figure out ownership of aio context during bdrv_close().
> > > 
> > > As far as I understand bdrv_unref() shold be called with acquired aio
> > > context to prevent concurrent operations (at least most usages in 
> > > blockdev.c
> > > explicitly acquire and release context, but not all).
> > I think the theory is like this:
> > 
> > 1. bdrv_unref() can only be called from the main thread
> > 
> > 2. A block node for which bdrv_close() is called has no references. If
> >     there are no more parents that keep it in a non-default iothread,
> >     they should be in the main AioContext. So no locks need to be taken
> >     during bdrv_close().
> > 
> > In practice, 2. is not fully true today, even though block devices do
> > stop their dataplane and move the block nodes back to the main
> > AioContext on shutdown. I am currently working on some fixes related to
> > this, afterwards the situation should be better.
> > 
> > > But if refcount reaches zero and bs is going to be deleted in bdrv_close()
> > > we need to ensure that drain is finished data is flushed and there are no
> > > more pending coroutines and bottomhalves, so drain and flush functions can
> > > enter coroutine and perform yield in several places. As a result control
> > > returns to coroutine caller that will release aio context and when
> > > completion bh will continue cleanup process it will be executed without
> > > ownership of context. Is this a valid situation?
> > Do you have an example where this happens?
> > 
> > Normally, leaving the coroutine means that the AioContext lock is
> > released, but it is later reentered from the same AioContext, so the
> > lock will be taken again.
> I was wrong. Coroutines do acquire aio context when reentered.
> > 
> > > Moreover if yield happens bs that is being deleted has zero refcount but 
> > > is
> > > still present in lists graph_bdrv_states and all_bdrv_states and can be
> > > accidentally accessed. Shouldn't we remove it from these lists ASAP when
> > > deletion process starts as we do from monitor_bdrv_states?
> > Hm, I think it should only disappear when the image file is actually
> > closed. But in practice, it probably makes little difference either way.
> 
> I'm still worried about a period of time since coroutine yields and until it
> is reentered, looks like aio context can be grabbed by other code and bs can
> be treated as valid.
> 
> I have no example of such situation too but I see there bdrv_aio_flush and
> bdrv_co_flush_to_disk callbacks which are called during flush and can take
> unpredicable time to complete (e.g. raw_aio_flush in file-win32 uses thread
> pool inside to process request and RBD can also be affected but I didn't dig
> deep enough to be sure).
> 
> If main loop starts processing next qmp command before completion is called
> lists will be in inconsistent state and hard to debug use-after-free bugs
> and crashes can happen.


Hm, at the point of flush, bs is actually still valid, so e.g.
query-block would just work. But I think we would indeed have a problem
if a new reference to the node is created.

> Fix seems trivial and shouldn't break any existing code.
> 
> ---
> 
> diff --git a/block.c b/block.c
> index 16615bc876..25c3b72520 100644
> --- a/block.c
> +++ b/block.c
> @@ -4083,14 +4083,14 @@ static void bdrv_delete(BlockDriverState *bs)
>      assert(bdrv_op_blocker_is_empty(bs));
>      assert(!bs->refcnt);
> 
> -    bdrv_close(bs);
> -
>      /* remove from list, if necessary */
>      if (bs->node_name[0] != '\0') {
>          QTAILQ_REMOVE(&graph_bdrv_states, bs, node_list);
>      }
>      QTAILQ_REMOVE(&all_bdrv_states, bs, bs_list);
> 
> +    bdrv_close(bs);
> +
>      g_free(bs);
>  }

This looks reasonable enough to me.

Kevin

Re: [Qemu-devel] aio context ownership during bdrv_close()

Reply via email to