10.04.2019 23:20, Max Reitz wrote: > What bs->file and bs->backing mean depends on the node. For filter > nodes, both signify a node that will eventually receive all R/W > accesses. For format nodes, bs->file contains metadata and data, and > bs->backing will not receive writes -- instead, writes are COWed to > bs->file. Usually. > > In any case, it is not trivial to guess what a child means exactly with > our currently limited form of expression. It is better to introduce > some functions that actually guarantee a meaning: > > - bdrv_filtered_cow_child() will return the child that receives requests > filtered through COW. That is, reads may or may not be forwarded > (depending on the overlay's allocation status), but writes never go to > this child. > > - bdrv_filtered_rw_child() will return the child that receives requests > filtered through some very plain process. Reads and writes issued to > the parent will go to the child as well (although timing, etc. may be > modified). > > - All drivers but quorum (but quorum is pretty opaque to the general > block layer anyway) always only have one of these children: All read > requests must be served from the filtered_rw_child (if it exists), so > if there was a filtered_cow_child in addition, it would not receive > any requests at all. > (The closest here is mirror, where all requests are passed on to the > source, but with write-blocking, write requests are "COWed" to the > target. But that just means that the target is a special child that > cannot be introspected by the generic block layer functions, and that > source is a filtered_rw_child.) > Therefore, we can also add bdrv_filtered_child() which returns that > one child (or NULL, if there is no filtered child). > > Also, many places in the current block layer should be skipping filters > (all filters or just the ones added implicitly, it depends) when going > through a block node chain. They do not do that currently, but this > patch makes them. > > One example for this is qemu-img map, which should skip filters and only > look at the COW elements in the graph. The change to iotest 204's > reference output shows how using blkdebug on top of a COW node used to > make qemu-img map disregard the rest of the backing chain, but with this > patch, the allocation in the base image is reported correctly. > > Furthermore, a note should be made that sometimes we do want to access > bs->backing directly. This is whenever the operation in question is not > about accessing the COW child, but the "backing" child, be it COW or > not. This is the case in functions such as bdrv_open_backing_file() or > whenever we have to deal with the special behavior of @backing as a > blockdev option, which is that it does not default to null like all > other child references do. > > Finally, the query functions (query-block and query-named-block-nodes) > are modified to return any filtered child under "backing", not just > bs->backing or COW children. This is so that filters do not interrupt > the reported backing chain. This changes the output of iotest 184, as > the throttled node now appears as a backing child. > > Signed-off-by: Max Reitz <mre...@redhat.com> > --- > qapi/block-core.json | 4 + > include/block/block.h | 1 + > include/block/block_int.h | 40 +++++-- > block.c | 210 +++++++++++++++++++++++++++------ > block/backup.c | 8 +- > block/block-backend.c | 16 ++- > block/commit.c | 33 +++--- > block/io.c | 45 ++++--- > block/mirror.c | 21 ++-- > block/qapi.c | 30 +++-- > block/stream.c | 13 +- > blockdev.c | 88 +++++++++++--- > migration/block-dirty-bitmap.c | 4 +- > nbd/server.c | 6 +- > qemu-img.c | 29 ++--- > tests/qemu-iotests/184.out | 7 +- > tests/qemu-iotests/204.out | 1 + > 17 files changed, 411 insertions(+), 145 deletions(-)
really huge... didn't you consider conversion file-by-file? [..] > diff --git a/block.c b/block.c > index 16615bc876..e8f6febda0 100644 > --- a/block.c > +++ b/block.c [..] > > @@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState > *reopen_state, > /* > * Find the "actual" backing file by skipping all links that point > * to an implicit node, if any (e.g. a commit filter node). > + * We cannot use any of the bdrv_skip_*() functions here because > + * those return the first explicit node, while we are looking for > + * its overlay here. > */ > overlay_bs = bs; > - while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) { > - overlay_bs = backing_bs(overlay_bs); > + while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) { So, you don't want to skip implicit filters with 'file' child? Then, why not to use child_bs(overlay_bs->backing), like in following if condition? Could we instead make backing-based filters equal to file-based, to make it possible to use file-based filters in backing-chain related scenarios (like upcoming copy-on-read filter for stream)? So, to expand backing-chain concept to include filters with file child? > + overlay_bs = bdrv_filtered_bs(overlay_bs); > } > > /* If we want to replace the backing file we need some extra checks */ > - if (new_backing_bs != backing_bs(overlay_bs)) { > + if (new_backing_bs != child_bs(overlay_bs->backing)) { > /* > Check for implicit nodes between bs and its backing file */ > if (bs != overlay_bs) { > error_setg(errp, "Cannot change backing link if '%s' has " [..] > @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs, > BlockDriverState *bdrv_find_overlay(BlockDriverState *active, > BlockDriverState *bs) > { > - while (active && bs != backing_bs(active)) { > - active = backing_bs(active); > + while (active && bs != bdrv_filtered_bs(active)) { hmm and here you actually support backing-chain with file-child-based filters in it.. > + active = bdrv_filtered_bs(active); > } > > return active; > @@ -4226,11 +4231,11 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState > *bs, BlockDriverState *base, > { > BlockDriverState *i; > > - for (i = bs; i != base; i = backing_bs(i)) { > + for (i = bs; i != base; i = child_bs(i->backing)) { and here don't.. > if (i->backing && i->backing->frozen) { > error_setg(errp, "Cannot change '%s' link from '%s' to '%s'", > i->backing->name, i->node_name, > - backing_bs(i)->node_name); > + i->backing->bs->node_name); > return true; > } > } [..] > +static BlockDriverState *bdrv_skip_filters(BlockDriverState *bs, > + bool stop_on_explicit_filter) > +{ > + BdrvChild *filtered; > + > + if (!bs) { > + return NULL; > + } > + > + while (!(stop_on_explicit_filter && !bs->implicit)) { you may save some characters and extra operators by bool skip_explicit ... while (skip_explicit || bs->implicit) { > + filtered = bdrv_filtered_rw_child(bs); > + if (!filtered) { > + break; > + } > + bs = filtered->bs; > + } > + /* > + * Note that this treats nodes with bs->drv == NULL as not being > + * R/W filters (bs->drv == NULL should be replaced by something > + * else anyway). > + * The advantage of this behavior is that this function will thus > + * always return a non-NULL value (given a non-NULL @bs). > + */ > + > + return bs; > +} > + > +/* > + * Return the first BDS that has not been added implicitly or that > + * does not have an RW-filtered child down the chain starting from @bs > + * (including @bs itself). > + */ > +BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs) > +{ > + return bdrv_skip_filters(bs, true); > +} > + > +/* > + * Return the first BDS that does not have an RW-filtered child down > + * the chain starting from @bs (including @bs itself). > + */ > +BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs) > +{ > + return bdrv_skip_filters(bs, false); > +} > + > +/* > + * For a backing chain, return the first non-filter backing image. or second, if we start from filter > + */ > +BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs) > +{ > + return > bdrv_skip_rw_filters(bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs))); > +} -- Best regards, Vladimir