On Wed, Feb 19, 2014 at 09:42:25PM +0800, Fam Zheng wrote: > Dropping intermediate could be useful both for commit and stream, and > BDS refcnt plus bdrv_swap could do most of the job nicely. It also needs > to work with op blockers. > > Signed-off-by: Fam Zheng <f...@redhat.com> > --- > block.c | 146 > +++++++++++++++++++++++++-------------------------------- > block/commit.c | 1 + > 2 files changed, 66 insertions(+), 81 deletions(-) > > diff --git a/block.c b/block.c > index a2bf24c..cf41f3d 100644 > --- a/block.c > +++ b/block.c > @@ -2485,115 +2485,99 @@ BlockDriverState *bdrv_find_overlay(BlockDriverState > *active, > return overlay; > } > > -typedef struct BlkIntermediateStates { > - BlockDriverState *bs; > - QSIMPLEQ_ENTRY(BlkIntermediateStates) entry; > -} BlkIntermediateStates; > - > - > /* > - * Drops images above 'base' up to and including 'top', and sets the image > - * above 'top' to have base as its backing file. > + * Drops images above 'base' up to and including 'top', and sets new 'base' > + * as backing_hd of top_overlay (the image orignally has 'top' as backing
What is 'top_overlay'? Do you mean "top's overlay" by this? > + * file). top_overlay may be NULL if 'top' is active, no such update needed. > + * Requires that the top_overlay to 'top' is opened r/w. > * > - * Requires that the overlay to 'top' is opened r/w, so that the backing file > - * information in 'bs' can be properly updated. > + * 1) This will convert the following chain: > * > - * E.g., this will convert the following chain: > - * bottom <- base <- intermediate <- top <- active > + * ... <- base <- ... <- top <- overlay <-... <- active > * > * to > * > - * bottom <- base <- active > + * ... <- base <- overlay <- active > * > - * It is allowed for bottom==base, in which case it converts: > + * 2) It is allowed for bottom==base, in which case it converts: > * > - * base <- intermediate <- top <- active > + * base <- ... <- top <- overlay <- ... <- active > * > * to > * > - * base <- active > + * base <- overlay <- active > + * > + * 2) It also allows active==top, in which case it converts: > + * > + * ... <- base <- ... <- top (active) > + * > + * to > + * > + * ... <- base == active == top > + * > + * i.e. only base and lower remains: *top == *base when return. > + * > + * 3) If base==NULL, it will drop all the BDS below overlay and set its > + * backing_hd to NULL. I.e.: > + * > + * base(NULL) <- ... <- overlay <- ... <- active > + * > + * to > * > - * Error conditions: > - * if active == top, that is considered an error > + * overlay <- ... <- active > * > */ > int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top, > BlockDriverState *base) With the active case, we aren't necessarily really just dropping intermediate images anymore. Maybe we should rename this function now to 'bdrv_rebase_chain()'? > { > - BlockDriverState *intermediate; > - BlockDriverState *base_bs = NULL; > - BlockDriverState *new_top_bs = NULL; > - BlkIntermediateStates *intermediate_state, *next; > - int ret = -EIO; > - > - QSIMPLEQ_HEAD(states_to_delete, BlkIntermediateStates) states_to_delete; > - QSIMPLEQ_INIT(&states_to_delete); > - > - if (!top->drv || !base->drv) { > - goto exit; > - } > - > - new_top_bs = bdrv_find_overlay(active, top); > + BlockDriverState *drop_start, *overlay; > + int ret = -EINVAL; > > - if (new_top_bs == NULL) { > - /* we could not find the image above 'top', this is an error */ > + if (!top->drv || (base && !base->drv)) { > goto exit; > } > - > - /* special case of new_top_bs->backing_hd already pointing to base - > nothing > - * to do, no intermediate images */ > - if (new_top_bs->backing_hd == base) { > + if (top == base) { > ret = 0; > - goto exit; > - } > - > - intermediate = top; > - > - /* now we will go down through the list, and add each BDS we find > - * into our deletion queue, until we hit the 'base' > - */ > - while (intermediate) { > - intermediate_state = g_malloc0(sizeof(BlkIntermediateStates)); > - intermediate_state->bs = intermediate; > - QSIMPLEQ_INSERT_TAIL(&states_to_delete, intermediate_state, entry); > - > - if (intermediate->backing_hd == base) { > - base_bs = intermediate->backing_hd; > - break; > + } else if (top == active) { > + assert(base); > + drop_start = active->backing_hd; > + bdrv_swap(active, base); > + base->backing_hd = NULL; > + bdrv_unref(drop_start); > + ret = 0; This now orphans everything between active->backing_hd and the original base, without performing a bdrv_unref/delete on them. > + } else { > + /* If there's an overlay, its backing_hd points to top's BDS now, > + * the top image is dropped but this BDS structure is kept and > swapped > + * with base, this way we keep the pointers valid after dropping top > */ > + overlay = bdrv_find_overlay(active, top); > + if (!overlay) { > + goto exit; > + } > + if (base) { > + ret = bdrv_change_backing_file(overlay, base->filename, > + base->drv->format_name); > + } else { > + ret = bdrv_change_backing_file(overlay, NULL, NULL); > + } > + if (ret) { > + goto exit; > + } > + if (base) { > + drop_start = top->backing_hd; > + bdrv_swap(top, base); > + /* Break the loop formed by bdrv_swap */ > + bdrv_set_backing_hd(base, NULL); And in the non-active case here, everything between top->backing_hd and the original base is orphaned as well. These should all be explicitly unreferenced. Also, side effect: Caller needs to beware now that base and top are now swapped [1]. > + } else { > + bdrv_set_backing_hd(overlay, NULL); > + drop_start = top; Again, everything between top and the original base is orphaned, but should be cleaned up. Caller does not have to worry about base and top being swapped [1]. > } > - intermediate = intermediate->backing_hd; > - } > - if (base_bs == NULL) { > - /* something went wrong, we did not end at the base. safely > - * unravel everything, and exit with error */ > - goto exit; > - } > - > - /* success - we can delete the intermediate states, and link top->base */ > - ret = bdrv_change_backing_file(new_top_bs, base_bs->filename, > - base_bs->drv ? base_bs->drv->format_name > : ""); > - if (ret) { > - goto exit; > - } > - new_top_bs->backing_hd = base_bs; > - > - bdrv_refresh_limits(new_top_bs); > > - QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, > next) { > - /* so that bdrv_close() does not recursively close the chain */ > - intermediate_state->bs->backing_hd = NULL; > - bdrv_unref(intermediate_state->bs); > + bdrv_unref(drop_start); We will get an assertion here. In the non-active case, the backing_hd is explicitly set to NULL via bdrv_set_backing_hd(). That function will call bdrv_unref() on the same BDS that drop_start was assigned, so we have a double call to bdrv_unref(). > } > - ret = 0; > - > exit: > - QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, > next) { > - g_free(intermediate_state); > - } > return ret; > } > > - > static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset, > size_t size) > { > diff --git a/block/commit.c b/block/commit.c > index acec4ac..b10eb79 100644 > --- a/block/commit.c > +++ b/block/commit.c > @@ -142,6 +142,7 @@ wait: > if (!block_job_is_cancelled(&s->common) && sector_num == end) { > /* success */ > ret = bdrv_drop_intermediate(active, top, base); > + base = top; This is where it is highlighted to me how odd it is to use the side effects of bdrv_swap() in bdrv_drop_intermediate() for the non-active layer case. The function bdrv_drop_intermediate() is now actually pretty complex and tricky to use, with side effects that the caller needs to beware of, that change depending on the nature of the arguments passed. [1] Side affects, depending on active, top, and base: active = top | base = NULL | side effect ----------------------------------------------- (A) false | false | top and base are swapped (B) false | true | none (C) true | false | top and base are swapped (D) true | true | assert() Case (C) is reasonable, because active and base need to be swapped, and top == active. It is expected almost by definition. Case (A) is a bit odd, especially in light of case (B). > } > > exit_free_buf: Further down, out of the context of this patch, we have: exit_restore_reopen: /* restore base open flags here if appropriate (e.g., change the base back * to r/o). These reopens do not need to be atomic, since we won't abort * even on failure here */ if (s->base_flags != bdrv_get_flags(base)) { bdrv_reopen(base, s->base_flags, NULL); } OK, 'base' is the one we want to operate on now, that was set to 'top', which has the contents of the old 'base'. overlay_bs = bdrv_find_overlay(active, top); Will we find the right overlay here? I think now overlay_bs will always be NULL, so we won't restore the r/o flags (if set) for the overlay of the original 'top'. if (overlay_bs && s->orig_overlay_flags != bdrv_get_flags(overlay_bs)) { bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL); } > -- > 1.8.5.4 > >