On Thu, Feb 20, 2014 at 12:37:17PM +0800, Fam Zheng wrote: > On Wed, 02/19 18:24, Jeff Cody wrote: > > On Wed, Feb 19, 2014 at 04:22:30PM -0500, Jeff Cody wrote: > > > On Wed, Feb 19, 2014 at 09:42:25PM +0800, Fam Zheng wrote: > > > > Dropping intermediate could be useful both for commit and stream, and > > > > BDS refcnt plus bdrv_swap could do most of the job nicely. It also needs > > > > to work with op blockers. > > > > > > > > Signed-off-by: Fam Zheng <f...@redhat.com> > > > > --- > > > > block.c | 146 > > > > +++++++++++++++++++++++++-------------------------------- > > > > block/commit.c | 1 + > > > > 2 files changed, 66 insertions(+), 81 deletions(-) > > > > > > > > diff --git a/block.c b/block.c > > > > index a2bf24c..cf41f3d 100644 > > > > --- a/block.c > > > > +++ b/block.c > > > > @@ -2485,115 +2485,99 @@ BlockDriverState > > > > *bdrv_find_overlay(BlockDriverState *active, > > > > return overlay; > > > > } > > > > > > > > -typedef struct BlkIntermediateStates { > > > > - BlockDriverState *bs; > > > > - QSIMPLEQ_ENTRY(BlkIntermediateStates) entry; > > > > -} BlkIntermediateStates; > > > > - > > > > - > > > > /* > > > > - * Drops images above 'base' up to and including 'top', and sets the > > > > image > > > > - * above 'top' to have base as its backing file. > > > > + * Drops images above 'base' up to and including 'top', and sets new > > > > 'base' > > > > + * as backing_hd of top_overlay (the image orignally has 'top' as > > > > backing > > > > > > What is 'top_overlay'? Do you mean "top's overlay" by this? > > Yes, as noted in the parenthesis. >
I would just say "top's overlay". What I found confusing by that, is when you reference something like 'top_overlay', it looks like an actual variable name. So I was searching for that variable name, and wondered if it was just vestigial from an earlier revision. Maybe that is just me, though :) > > > > > > > + * file). top_overlay may be NULL if 'top' is active, no such update > > > > needed. > > > > + * Requires that the top_overlay to 'top' is opened r/w. > > > > * > > > > - * Requires that the overlay to 'top' is opened r/w, so that the > > > > backing file > > > > - * information in 'bs' can be properly updated. > > > > + * 1) This will convert the following chain: > > > > * > > > > - * E.g., this will convert the following chain: > > > > - * bottom <- base <- intermediate <- top <- active > > > > + * ... <- base <- ... <- top <- overlay <-... <- active > > > > * > > > > * to > > > > * > > > > - * bottom <- base <- active > > > > + * ... <- base <- overlay <- active > > > > * > > > > - * It is allowed for bottom==base, in which case it converts: > > > > + * 2) It is allowed for bottom==base, in which case it converts: > > > > * > > > > - * base <- intermediate <- top <- active > > > > + * base <- ... <- top <- overlay <- ... <- active > > > > * > > > > * to > > > > * > > > > - * base <- active > > > > + * base <- overlay <- active > > > > + * > > > > + * 2) It also allows active==top, in which case it converts: > > > > + * > > > > + * ... <- base <- ... <- top (active) > > > > + * > > > > + * to > > > > + * > > > > + * ... <- base == active == top > > > > + * > > > > + * i.e. only base and lower remains: *top == *base when return. > > > > + * > > > > + * 3) If base==NULL, it will drop all the BDS below overlay and set its > > > > + * backing_hd to NULL. I.e.: > > > > + * > > > > + * base(NULL) <- ... <- overlay <- ... <- active > > > > + * > > > > + * to > > > > * > > > > - * Error conditions: > > > > - * if active == top, that is considered an error > > > > + * overlay <- ... <- active > > > > * > > > > */ > > > > int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState > > > > *top, > > > > BlockDriverState *base) > > > > > > With the active case, we aren't necessarily really just dropping > > > intermediate images anymore. Maybe we should rename this function now to > > > 'bdrv_rebase_chain()'? > > > > > > > { > > > > - BlockDriverState *intermediate; > > > > - BlockDriverState *base_bs = NULL; > > > > - BlockDriverState *new_top_bs = NULL; > > > > - BlkIntermediateStates *intermediate_state, *next; > > > > - int ret = -EIO; > > > > - > > > > - QSIMPLEQ_HEAD(states_to_delete, BlkIntermediateStates) > > > > states_to_delete; > > > > - QSIMPLEQ_INIT(&states_to_delete); > > > > - > > > > - if (!top->drv || !base->drv) { > > > > - goto exit; > > > > - } > > > > - > > > > - new_top_bs = bdrv_find_overlay(active, top); > > > > + BlockDriverState *drop_start, *overlay; > > > > + int ret = -EINVAL; > > > > > > > > - if (new_top_bs == NULL) { > > > > - /* we could not find the image above 'top', this is an error */ > > > > + if (!top->drv || (base && !base->drv)) { > > > > goto exit; > > > > } > > > > - > > > > - /* special case of new_top_bs->backing_hd already pointing to base > > > > - nothing > > > > - * to do, no intermediate images */ > > > > - if (new_top_bs->backing_hd == base) { > > > > + if (top == base) { > > > > ret = 0; > > > > - goto exit; > > > > - } > > > > - > > > > - intermediate = top; > > > > - > > > > - /* now we will go down through the list, and add each BDS we find > > > > - * into our deletion queue, until we hit the 'base' > > > > - */ > > > > - while (intermediate) { > > > > - intermediate_state = g_malloc0(sizeof(BlkIntermediateStates)); > > > > - intermediate_state->bs = intermediate; > > > > - QSIMPLEQ_INSERT_TAIL(&states_to_delete, intermediate_state, > > > > entry); > > > > - > > > > - if (intermediate->backing_hd == base) { > > > > - base_bs = intermediate->backing_hd; > > > > - break; > > > > + } else if (top == active) { > > > > + assert(base); > > > > + drop_start = active->backing_hd; > > > > + bdrv_swap(active, base); > > > > + base->backing_hd = NULL; > > > > + bdrv_unref(drop_start); > > > > + ret = 0; > > > > + } else { > > > > + /* If there's an overlay, its backing_hd points to top's BDS > > > > now, > > > > + * the top image is dropped but this BDS structure is kept and > > > > swapped > > > > + * with base, this way we keep the pointers valid after > > > > dropping top */ > > > > + overlay = bdrv_find_overlay(active, top); > > > > + if (!overlay) { > > > > + goto exit; > > > > + } > > > > + if (base) { > > > > + ret = bdrv_change_backing_file(overlay, base->filename, > > > > + base->drv->format_name); > > > > + } else { > > > > + ret = bdrv_change_backing_file(overlay, NULL, NULL); > > > > + } > > > > + if (ret) { > > > > + goto exit; > > > > + } > > > > + if (base) { > > > > + drop_start = top->backing_hd; > > > > + bdrv_swap(top, base); > > > > + /* Break the loop formed by bdrv_swap */ > > > > + bdrv_set_backing_hd(base, NULL); > > > > > > And in the non-active case here, everything between top->backing_hd > > > and the original base is orphaned as well. These should all be > > > explicitly unreferenced. > > > > Same here, bdrv_unref() will eventually go through the chain, starting > > from top->backing_hd. But this is a problem; won't we end up in a > > loop then? > > Although the content is swapped, the pointer is not: > > (I presume your "[base]" and "[top]" are denoting content, not pointer) > Correct. But part of the content that is swapped, are the backing_hd pointers. > > > > Take this chain: > > > > drop_start = [A] > > > > |||-- ([base]) <-- [B] <--- [A] <--- ([top]) <--- [active] > ^ ^ > | | > base top > > > > > > bdrv_swap(top, base): > > > > -- [B] <-- [A] <-- ([top]) |||--- ([base]) <-- [active] > ^ ^ > | | > base top > > | ^ > > | | > > --------------------- > > Correct, those are the pointers. > > Then we call bdrv_unref(drop_start (or bdrv_set_backing_hd() does), > > and we end up with: > > dropping an anchor here: [1] > > bdrv_unref(A) > > bdrv_unref(B) > > bdrv_unref(top) > > bdrv_unref(A) <--- assert > > ..... > > > > > > So I think we want this line: > > > > > > + bdrv_set_backing_hd(base, NULL); > > so, this breaks the chain, Yes, you are right, we want base->backing_hd to be NULL. But the chain has not been broken yet. The loop [1] still exists, because once we enter bdrv_set_backing_hd() we begin to call bdrv_unref(A). And base_ptr->backing_hd still points to A, and B will point to base_ptr. Here is the first part of bdrv_set_backing_hd(): if (bs->backing_hd) { bdrv_op_unblock_all(bs->backing_hd, bs->backing_blocker); bdrv_unref(bs->backing_hd); > > > > > To be: > > > > > > + bdrv_set_backing_hd(top, NULL); > > This will lose track of original base's backing_hd. Right, we don't want that, sorry... I shouldn't have written that, my brain failed me. I mentally conflated top and [top]. > > So I think we are OK here. > I don't think we are, we still need to address the backing_hd loop, and I think it needs to be done here, where we have the information. > But I find that a fix is needed in bdrv_set_backing_hd to handle the rebase > correctly. > What we really want, prior to starting to unref anything, is to set the drop_start = base_ptr->backing_hd, and then set base_ptr->backing_hd = NULL. Then the bdrv_unref(drop_start) will perform as expected (see [2], below). And, at least in the usage here, we probably don't want bdrv_set_backing_hd() to unref anything for us, but I'm sure there is some way to make that work if it is cleaner that way. That will get you what I was originally trying to get at in my previous email, when I unfortunately conflated top contents with top pointer: > > > > > > Right? Or, just set top->backing_hd = NULL, so we get: ^^^ please read this as 'base' (as in base_ptr) anchor [2]: > > > > -- [B] <-- [A] |||-- ([top]) |||--- ([base]) <-- [active] > > | ^ > > | | > > --------------------------- base_ptr->backing_hd is set to NULL first ^^ then the bdrv_unref(drop_start): > > > > bdrv_unref(A) > > bdrv_unref(B) > > bdrv_unref(top) > > > > > > Which leaves: > > > > |||--- ([base]) <-- [active] > > > > > > So this part above still needs addressing, I think. > > > > > > > > Also, side effect: > > > Caller needs to beware now that base and top are now swapped [1]. > > > > > > > + } else { > > > > + bdrv_set_backing_hd(overlay, NULL); > > > > + drop_start = top; > > > > > > Again, everything between top and the original base is orphaned, but > > > should be cleaned up. > > > > > > Caller does not have to worry about base and top being swapped [1]. > > > > > > > This should be fine, I think. > > > > > > I think everything else I mentioned below this point is still > > relevant, however. > > > > > > > > > } > > > > - intermediate = intermediate->backing_hd; > > > > - } > > > > - if (base_bs == NULL) { > > > > - /* something went wrong, we did not end at the base. safely > > > > - * unravel everything, and exit with error */ > > > > - goto exit; > > > > - } > > > > - > > > > - /* success - we can delete the intermediate states, and link > > > > top->base */ > > > > - ret = bdrv_change_backing_file(new_top_bs, base_bs->filename, > > > > - base_bs->drv ? > > > > base_bs->drv->format_name : ""); > > > > - if (ret) { > > > > - goto exit; > > > > - } > > > > - new_top_bs->backing_hd = base_bs; > > > > - > > > > - bdrv_refresh_limits(new_top_bs); > > > > > > > > - QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, > > > > entry, next) { > > > > - /* so that bdrv_close() does not recursively close the chain */ > > > > - intermediate_state->bs->backing_hd = NULL; > > > > - bdrv_unref(intermediate_state->bs); > > > > + bdrv_unref(drop_start); > > > > > > We will get an assertion here. In the non-active case, the backing_hd > > > is explicitly set to NULL via bdrv_set_backing_hd(). That function > > > will call bdrv_unref() on the same BDS that drop_start was assigned, > > > so we have a double call to bdrv_unref(). > > > > > > > } > > > > - ret = 0; > > > > - > > > > exit: > > > > - QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, > > > > entry, next) { > > > > - g_free(intermediate_state); > > > > - } > > > > return ret; > > > > } > > > > > > > > - > > > > static int bdrv_check_byte_request(BlockDriverState *bs, int64_t > > > > offset, > > > > size_t size) > > > > { > > > > diff --git a/block/commit.c b/block/commit.c > > > > index acec4ac..b10eb79 100644 > > > > --- a/block/commit.c > > > > +++ b/block/commit.c > > > > @@ -142,6 +142,7 @@ wait: > > > > if (!block_job_is_cancelled(&s->common) && sector_num == end) { > > > > /* success */ > > > > ret = bdrv_drop_intermediate(active, top, base); > > > > + base = top; > > > > > > This is where it is highlighted to me how odd it is to use the side > > > effects of bdrv_swap() in bdrv_drop_intermediate() for the non-active > > > layer case. > > > > > > The function bdrv_drop_intermediate() is now actually pretty complex > > > and tricky to use, with side effects that the caller needs to beware > > > of, that change depending on the nature of the arguments passed. > > > > > > [1] Side affects, depending on active, top, and base: > > > > > > active = top | base = NULL | side effect > > > ----------------------------------------------- > > > (A) false | false | top and base are swapped > > > (B) false | true | none > > > (C) true | false | top and base are swapped > > > (D) true | true | assert() > > > > > > > > > Case (C) is reasonable, because active and base need to be swapped, > > > and top == active. It is expected almost by definition. > > > > > > Case (A) is a bit odd, especially in light of case (B). > > Makes sense, I will remove this side effect. > > > > > > > > > > > } > > > > > > > > exit_free_buf: > > > > > > > > > Further down, out of the context of this patch, we have: > > > > > > > > > exit_restore_reopen: > > > /* restore base open flags here if appropriate (e.g., change the > > > base back > > > * to r/o). These reopens do not need to be atomic, since we won't > > > abort > > > * even on failure here */ > > > if (s->base_flags != bdrv_get_flags(base)) { > > > bdrv_reopen(base, s->base_flags, NULL); > > > } > > > > > > OK, 'base' is the one we want to operate on now, that was set to > > > 'top', which has the contents of the old 'base'. > > > > > If I remove the swap, we don't need to set base to top here. > Yes, that will keep the usage more consistent, thanks. > > > > > > overlay_bs = bdrv_find_overlay(active, top); > > > > > > Will we find the right overlay here? I think now overlay_bs will > > > always be NULL, so we won't restore the r/o flags (if set) for the > > > overlay of the original 'top'. > > > > > > if (overlay_bs && s->orig_overlay_flags != > > > bdrv_get_flags(overlay_bs)) { > > > bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL); > > > } > > Yes, need a fix here. > > Thanks, > Fam