On Mon, 11/16 17:15, Max Reitz wrote: > >> @@ -1971,13 +1969,44 @@ static void bdrv_close(BlockDriverState *bs) > >> void bdrv_close_all(void) > >> { > >> BlockDriverState *bs; > >> + AioContext *aio_context; > >> + int original_refcount = 0; > >> > >> - QTAILQ_FOREACH(bs, &bdrv_states, device_list) { > >> - AioContext *aio_context = bdrv_get_aio_context(bs); > >> + /* Drop references from requests still in flight, such as canceled > >> block > >> + * jobs whose AIO context has not been polled yet */ > >> + bdrv_drain_all(); > >> > >> - aio_context_acquire(aio_context); > >> - bdrv_close(bs); > >> - aio_context_release(aio_context); > >> + blockdev_close_all_bdrv_states(); > >> + blk_remove_all_bs(); > >> + > >> + /* Cancel all block jobs */ > >> + while (!QTAILQ_EMPTY(&all_bdrv_states)) { > >> + QTAILQ_FOREACH(bs, &all_bdrv_states, bs_list) { > >> + aio_context = bdrv_get_aio_context(bs); > >> + > >> + aio_context_acquire(aio_context); > >> + if (bs->job) { > >> + /* So we can safely query the current refcount */ > >> + bdrv_ref(bs); > >> + original_refcount = bs->refcnt; > >> + > >> + block_job_cancel_sync(bs->job); > >> + aio_context_release(aio_context); > >> + break; > >> + } > >> + aio_context_release(aio_context); > >> + } > >> + > >> + /* All the remaining BlockDriverStates are referenced directly or > >> + * indirectly from block jobs, so there needs to be at least one > >> BDS > >> + * directly used by a block job */ > >> + assert(bs); > >> + > >> + /* Wait for the block job to release its reference */ > >> + while (bs->refcnt >= original_refcount) { > >> + aio_poll(aio_context, true); > >> + } > >> + bdrv_unref(bs); > > > > If at this point bs->refcnt is greater than 1, why don't we care where are > > the > > remaining references from? > > We do care. A BDS will not be removed from all_bdrv_states until it is > deleted (i.e. its refcount becomes 0). Therefore, this loop will > continue until all BDSs have been deleted. > > So where might additional references come from? Since this loop only > cares about direct or indirect references from block jobs, that's > exactly it: > > (1) You might have multiple block jobs running on a BDS in the future. > Then, you'll cancel them one by one, and after having canceled the > first one, the refcount will still be greater than one before the > bdrv_unref(). > > (2) Imagine a BDS A with a parent BDS B. There are block jobs running on > both of them. Now, B is referenced by both its block job and by A > (indirectly by the block job referencing A). If we cancel the job on > B before the one on A, then the refcount on B will still be greater > than 1 before bdrv_unref() because it is still referenced by its > parent (until the block job on A is canceled, too). > > The first cannot happen right now, but the second one may, I'm not sure > (depending on whether op blockers allow it).
OK, that makes sense. The all_bdrv_states is the central place to make sure all refcnts reaching 0. > > > And we do make sure that there are no additional references besides: > - directly from the monitor (monitor-owned BDSs) > - from a BB > - from block jobs > - (+ everything transitively through the respective BDS tree) > > If there were additional references, the inner loop would at some point > no longer find a BDS with a block job while all_bdrv_states is still not > empty. That's what the "assert(bs)" after the inner loop is for. > > If you can imagine another way where a reference to a BDS may come from, > that would be a bug in this patch and we have to make sure to respect > that case, too. I think the only one reference I'm not sure is in xen_disk. blk_unref is called in the .free and .disconnect call backs but I have no idea if they are called before bdrv_close_all. Otherwise this patch looks good for me. Thanks, Fam