Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block replication

Stefan Hajnoczi Thu, 28 Jan 2016 07:17:59 -0800

On Thu, Jan 28, 2016 at 09:13:24AM +0800, Wen Congyang wrote:
> On 01/27/2016 10:46 PM, Stefan Hajnoczi wrote:
> > On Wed, Jan 13, 2016 at 05:18:31PM +0800, Changlong Xie wrote:
> >> +static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
> >> +{
> >> +    Error *local_err = NULL;
> >> +    int ret;
> >> +
> >> +    if (!s->secondary_disk->job) {
> >> +        error_setg(errp, "Backup job is cancelled unexpectedly");
> >> +        return;
> >> +    }
> >> +
> >> +    block_job_do_checkpoint(s->secondary_disk->job, &local_err);
> >> +    if (local_err) {
> >> +        error_propagate(errp, local_err);
> >> +        return;
> >> +    }
> >> +
> >> +    ret = s->active_disk->drv->bdrv_make_empty(s->active_disk);
> > 
> > What happens to in-flight requests to the active and hidden disks?
> 
> we MUST call do_checkpoint() when the vm is stopped.


Please document the environment under which the block replication
callback functions run.

I'm concerned that the bdrv_drain_all() in vm_stop() can take a long
time if the disk is slow/failing.  bdrv_drain_all() blocks until all
in-flight I/O requests have completed.  What does the Primary do if the
Secondary becomes unresponsive?

> >> +    switch (s->mode) {
> >> +    case REPLICATION_MODE_PRIMARY:
> >> +        break;
> >> +    case REPLICATION_MODE_SECONDARY:
> >> +        s->active_disk = bs->file->bs;
> >> +        if (!bs->file->bs->backing) {
> >> +            error_setg(errp, "Active disk doesn't have backing file");
> >> +            return;
> >> +        }
> >> +
> >> +        s->hidden_disk = s->active_disk->backing->bs;
> >> +        if (!s->hidden_disk->backing) {
> >> +            error_setg(errp, "Hidden disk doesn't have backing file");
> >> +            return;
> >> +        }
> >> +
> >> +        s->secondary_disk = s->hidden_disk->backing->bs;
> >> +        if (!s->secondary_disk->blk) {
> >> +            error_setg(errp, "The secondary disk doesn't have block 
> >> backend");
> >> +            return;
> >> +        }
> > 
> > Kevin: Is code allowed to stash away BlockDriverState pointers for
> > convenience or should it keep the BdrvChild pointers instead?  In order
> > for replication to work as expected, the graph shouldn't change but for
> > consistency maybe BdrvChild is best.

I asked Kevin about this on IRC and he agreed that BdrvChild should be
used instead of holding on to BlockDriverState * pointers.  Although
these pointers will not change during replication (if the op blockers
are set up correctly), it's more consistent and certainly safer to go
through BdrvChild.

> >> +        /* start backup job now */
> >> +        error_setg(&s->blocker,
> >> +                   "block device is in use by internal backup job");
> >> +        bdrv_op_block_all(s->top_bs, s->blocker);
> >> +        bdrv_op_unblock(s->top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
> >> +        bdrv_ref(s->hidden_disk);
> > 
> > Why is the explicit reference to hidden_disk (but not secondary_disk or
> > active_disk) is necessary?
> 
> IIRC, we should reference the backup target before calling backup_start(),
> and we will reference the backup source in backup_start().

I'm not sure why this is necessary since they are part of the backing
chain.

If it is necessary, please add a comment so it's clear why the reference
is being taken.

Stefan

signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block replication

Reply via email to