On 02/23/2016 12:08 AM, Fam Zheng wrote: > On Mon, 02/22 17:07, John Snow wrote: >> During incremental backups, if the target has a cluster size that is >> larger than the backup cluster size and we are backing up to a target >> that cannot (for whichever reason) pull clusters up from a backing image, >> we may inadvertantly create unusable incremental backup images. >> >> For example: >> >> If the bitmap tracks changes at a 64KB granularity and we transmit 64KB >> of data at a time but the target uses a 128KB cluster size, it is >> possible that only half of a target cluster will be recognized as dirty >> by the backup block job. When the cluster is allocated on the target >> image but only half populated with data, we lose the ability to >> distinguish between zero padding and uninitialized data. >> >> This does not happen if the target image has a backing file that points >> to the last known good backup. >> >> Even if we have a backing file, though, it's likely going to be faster >> to just buffer the redundant data ourselves from the live image than >> fetching it from the backing file, so let's just always round up to the >> target granularity. >> >> The same logic applies to backup modes top, none, and full. Copying >> fractional clusters without the guarantee of COW is dangerous, but even >> if we can rely on COW, it's likely better to just re-copy the data. >> >> Reported-by: Fam Zheng <f...@redhat.com> >> Signed-off-by: John Snow <js...@redhat.com> >> --- >> block/backup.c | 10 +++++++++- >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> diff --git a/block/backup.c b/block/backup.c >> index 76addef..a9a4d5c 100644 >> --- a/block/backup.c >> +++ b/block/backup.c >> @@ -501,6 +501,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState >> *target, >> BlockJobTxn *txn, Error **errp) >> { >> int64_t len; >> + BlockDriverInfo bdi; >> >> assert(bs); >> assert(target); >> @@ -578,7 +579,14 @@ void backup_start(BlockDriverState *bs, >> BlockDriverState *target, >> job->sync_mode = sync_mode; >> job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_INCREMENTAL ? >> sync_bitmap : NULL; >> - job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT; >> + >> + /* If there is no backing file on the target, we cannot rely on COW if >> our >> + * backup cluster size is smaller than the target cluster size. Instead >> of >> + * checking for a backing file, we assume that just copying the data in >> the >> + * backup loop is comparable to the unreliable COW. */ >> + bdrv_get_info(job->target, &bdi); > > bdrv_get_info can fail and bdi fields are uninitialized. Pleae test the > return > value and handle the error. > > Fam >
You're right. I thought it always did the memset, but it does have a failure route that is open prior to the memset. >> + job->cluster_size = MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size); >> + >> job->common.len = len; >> job->common.co = qemu_coroutine_create(backup_run); >> block_job_txn_add_job(txn, &job->common); >> -- >> 2.4.3 >>