On 10/04/2016 07:03 PM, John Snow wrote: > > > On 10/04/2016 12:02 PM, Stefan Hajnoczi wrote: >> On Tue, Oct 04, 2016 at 01:55:30PM +0200, Kevin Wolf wrote: >>> Am 04.10.2016 um 12:41 hat Denis V. Lunev geschrieben: >>>> On 10/04/2016 12:34 PM, Kevin Wolf wrote: >>>>> Am 04.10.2016 um 11:23 hat Stefan Hajnoczi geschrieben: >>>>>> On Mon, Oct 03, 2016 at 02:07:34PM -0400, John Snow wrote: >>>>>>> >>>>>>> On 10/03/2016 09:11 AM, Stefan Hajnoczi wrote: >>>>>>>> On Fri, Sep 30, 2016 at 09:59:16PM +0300, Vladimir >>>>>>>> Sementsov-Ogievskiy wrote: >>>>>>>>> On 30.09.2016 20:11, Vladimir Sementsov-Ogievskiy wrote: >>>>>>>>>> Hi all! >>>>>>>>>> >>>>>>>>>> Please, can somebody explain me, why we fail guest request in >>>>>>>>>> case of io >>>>>>>>>> error in write notifier? I think guest consistency is more >>>>>>>>>> important >>>>>>>>>> than success of unfinished backup. Or, what am I missing? >>>>>>>>>> >>>>>>>>>> I'm saying about this code: >>>>>>>>>> >>>>>>>>>> static int coroutine_fn backup_before_write_notify( >>>>>>>>>> NotifierWithReturn *notifier, >>>>>>>>>> void *opaque) >>>>>>>>>> { >>>>>>>>>> BackupBlockJob *job = container_of(notifier, BackupBlockJob, >>>>>>>>>> before_write); >>>>>>>>>> BdrvTrackedRequest *req = opaque; >>>>>>>>>> int64_t sector_num = req->offset >> BDRV_SECTOR_BITS; >>>>>>>>>> int nb_sectors = req->bytes >> BDRV_SECTOR_BITS; >>>>>>>>>> >>>>>>>>>> assert(req->bs == blk_bs(job->common.blk)); >>>>>>>>>> assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0); >>>>>>>>>> assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0); >>>>>>>>>> >>>>>>>>>> return backup_do_cow(job, sector_num, nb_sectors, NULL, >>>>>>>>>> true); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> So, what about something like >>>>>>>>>> >>>>>>>>>> ret = backup_do_cow(job, ... >>>>>>>>>> if (ret < 0 && job->notif_ret == 0) { >>>>>>>>>> job->notif_ret = ret; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> return 0; >>>>>>>>>> >>>>>>>>>> and fail block job if notif_ret < 0 in other places of backup >>>>>>>>>> code? >>>>>>>>>> >>>>>>>>> And second question about notifiers in backup block job. If >>>>>>>>> block job is >>>>>>>>> paused, notifiers still works and can copy data. Is it ok? So, >>>>>>>>> user thinks >>>>>>>>> that job is paused, so he can do something with target disk.. >>>>>>>>> But really, >>>>>>>>> this 'something' will race with write-notifiers. So, what >>>>>>>>> assumptions may >>>>>>>>> user actually have about paused backup job? Is there any >>>>>>>>> agreements? Also, >>>>>>>>> on query-block-jobs we will see job.busy = false, when actually >>>>>>>>> copy-on-write may be in flight.. >>>>>>>> I agree that the job should fail and the guest continues running. >>>>>>>> >>>>>>>> The backup job cannot do the usual ENOSPC stop/resume error >>>>>>>> handling >>>>>>>> since we lose snapshot consistency once guest writes are >>>>>>>> allowed to >>>>>>>> proceed. Backup errors need to be fatal, resuming is usually not >>>>>>>> possible. The user will have to retry the backup operation. >>>>>>>> >>>>>>>> Stefan >>>>>>>> >>>>>>> If we fail and intercept the error for the backup write and HALT >>>>>>> at that >>>>>>> point, why would we lose consistency? If the backup write failed >>>>>>> before we >>>>>>> allowed the guest write to proceed, that data should still be >>>>>>> there on disk, >>>>>>> no? >>>>>> I missed that there are two separate error handling approaches >>>>>> used in >>>>>> block/backup.c: >>>>>> >>>>>> 1. In the write notifier I/O errors are treated as if the guest >>>>>> write >>>>>> failed. >>>>>> >>>>>> 2. In the backup_run() loop I/O errors affect the block job's error >>>>>> status. >>>>>> >>>>>> I was thinking of case #2 instead of case #1. >>>>>> >>>>>>> Eh, regardless: If we're not using a STOP policy, it seems like >>>>>>> the right >>>>>>> thing to do is definitely to just fail the backup instead of >>>>>>> failing the >>>>>>> write. >>>>>> Even with a -drive werror=stop policy the user probably doesn't want >>>>>> guest downtime if writing to the backup target fails. >>>>> That's a policy decision that ultimately only the user can make. >>>>> For one >>>>> user, it might be preferable to cancel the backup and keep the VM >>>>> running, but for another user it may be more important to keep a >>>>> consistent snapshot of the point in time when the backup job was >>>>> started >>>>> than keeping the VM running. >>>>> >>>>> Kevin >>>> In this case policy for guest error and policy for backup >>>> error should be different policies or I have missed something. >>> >>> I guess so. >> >> There are separate error policies for -device and the blockjob. Perhaps >> the blockjob error policy can be used in the write notifier code path if >> the failure occurs while writing to the backup target. >> >> Stefan >> > > Sounds good to me. and to me this way. Cool.
Den