On Thu, Jan 28, 2016 at 09:13:24AM +0800, Wen Congyang wrote: > On 01/27/2016 10:46 PM, Stefan Hajnoczi wrote: > > On Wed, Jan 13, 2016 at 05:18:31PM +0800, Changlong Xie wrote: > >> +static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) > >> +{ > >> + Error *local_err = NULL; > >> + int ret; > >> + > >> + if (!s->secondary_disk->job) { > >> + error_setg(errp, "Backup job is cancelled unexpectedly"); > >> + return; > >> + } > >> + > >> + block_job_do_checkpoint(s->secondary_disk->job, &local_err); > >> + if (local_err) { > >> + error_propagate(errp, local_err); > >> + return; > >> + } > >> + > >> + ret = s->active_disk->drv->bdrv_make_empty(s->active_disk); > > > > What happens to in-flight requests to the active and hidden disks? > > we MUST call do_checkpoint() when the vm is stopped.
Please document the environment under which the block replication callback functions run. I'm concerned that the bdrv_drain_all() in vm_stop() can take a long time if the disk is slow/failing. bdrv_drain_all() blocks until all in-flight I/O requests have completed. What does the Primary do if the Secondary becomes unresponsive? > >> + switch (s->mode) { > >> + case REPLICATION_MODE_PRIMARY: > >> + break; > >> + case REPLICATION_MODE_SECONDARY: > >> + s->active_disk = bs->file->bs; > >> + if (!bs->file->bs->backing) { > >> + error_setg(errp, "Active disk doesn't have backing file"); > >> + return; > >> + } > >> + > >> + s->hidden_disk = s->active_disk->backing->bs; > >> + if (!s->hidden_disk->backing) { > >> + error_setg(errp, "Hidden disk doesn't have backing file"); > >> + return; > >> + } > >> + > >> + s->secondary_disk = s->hidden_disk->backing->bs; > >> + if (!s->secondary_disk->blk) { > >> + error_setg(errp, "The secondary disk doesn't have block > >> backend"); > >> + return; > >> + } > > > > Kevin: Is code allowed to stash away BlockDriverState pointers for > > convenience or should it keep the BdrvChild pointers instead? In order > > for replication to work as expected, the graph shouldn't change but for > > consistency maybe BdrvChild is best. I asked Kevin about this on IRC and he agreed that BdrvChild should be used instead of holding on to BlockDriverState * pointers. Although these pointers will not change during replication (if the op blockers are set up correctly), it's more consistent and certainly safer to go through BdrvChild. > >> + /* start backup job now */ > >> + error_setg(&s->blocker, > >> + "block device is in use by internal backup job"); > >> + bdrv_op_block_all(s->top_bs, s->blocker); > >> + bdrv_op_unblock(s->top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker); > >> + bdrv_ref(s->hidden_disk); > > > > Why is the explicit reference to hidden_disk (but not secondary_disk or > > active_disk) is necessary? > > IIRC, we should reference the backup target before calling backup_start(), > and we will reference the backup source in backup_start(). I'm not sure why this is necessary since they are part of the backing chain. If it is necessary, please add a comment so it's clear why the reference is being taken. Stefan
signature.asc
Description: PGP signature