Am 06.05.2019 um 14:47 hat Anton Kuchin geschrieben: > On 29/04/2019 13:38, Kevin Wolf wrote: > > Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben: > > > I can't figure out ownership of aio context during bdrv_close(). > > > > > > As far as I understand bdrv_unref() shold be called with acquired aio > > > context to prevent concurrent operations (at least most usages in > > > blockdev.c > > > explicitly acquire and release context, but not all). > > I think the theory is like this: > > > > 1. bdrv_unref() can only be called from the main thread > > > > 2. A block node for which bdrv_close() is called has no references. If > > there are no more parents that keep it in a non-default iothread, > > they should be in the main AioContext. So no locks need to be taken > > during bdrv_close(). > > > > In practice, 2. is not fully true today, even though block devices do > > stop their dataplane and move the block nodes back to the main > > AioContext on shutdown. I am currently working on some fixes related to > > this, afterwards the situation should be better. > > > > > But if refcount reaches zero and bs is going to be deleted in bdrv_close() > > > we need to ensure that drain is finished data is flushed and there are no > > > more pending coroutines and bottomhalves, so drain and flush functions can > > > enter coroutine and perform yield in several places. As a result control > > > returns to coroutine caller that will release aio context and when > > > completion bh will continue cleanup process it will be executed without > > > ownership of context. Is this a valid situation? > > Do you have an example where this happens? > > > > Normally, leaving the coroutine means that the AioContext lock is > > released, but it is later reentered from the same AioContext, so the > > lock will be taken again. > I was wrong. Coroutines do acquire aio context when reentered. > > > > > Moreover if yield happens bs that is being deleted has zero refcount but > > > is > > > still present in lists graph_bdrv_states and all_bdrv_states and can be > > > accidentally accessed. Shouldn't we remove it from these lists ASAP when > > > deletion process starts as we do from monitor_bdrv_states? > > Hm, I think it should only disappear when the image file is actually > > closed. But in practice, it probably makes little difference either way. > > I'm still worried about a period of time since coroutine yields and until it > is reentered, looks like aio context can be grabbed by other code and bs can > be treated as valid. > > I have no example of such situation too but I see there bdrv_aio_flush and > bdrv_co_flush_to_disk callbacks which are called during flush and can take > unpredicable time to complete (e.g. raw_aio_flush in file-win32 uses thread > pool inside to process request and RBD can also be affected but I didn't dig > deep enough to be sure). > > If main loop starts processing next qmp command before completion is called > lists will be in inconsistent state and hard to debug use-after-free bugs > and crashes can happen.
Hm, at the point of flush, bs is actually still valid, so e.g. query-block would just work. But I think we would indeed have a problem if a new reference to the node is created. > Fix seems trivial and shouldn't break any existing code. > > --- > > diff --git a/block.c b/block.c > index 16615bc876..25c3b72520 100644 > --- a/block.c > +++ b/block.c > @@ -4083,14 +4083,14 @@ static void bdrv_delete(BlockDriverState *bs) > assert(bdrv_op_blocker_is_empty(bs)); > assert(!bs->refcnt); > > - bdrv_close(bs); > - > /* remove from list, if necessary */ > if (bs->node_name[0] != '\0') { > QTAILQ_REMOVE(&graph_bdrv_states, bs, node_list); > } > QTAILQ_REMOVE(&all_bdrv_states, bs, bs_list); > > + bdrv_close(bs); > + > g_free(bs); > } This looks reasonable enough to me. Kevin