Am 21.05.2012 13:02, schrieb Paolo Bonzini: > Il 21/05/2012 12:32, Kevin Wolf ha scritto: >> Am 21.05.2012 12:02, schrieb Paolo Bonzini: >>> Il 21/05/2012 11:29, Kevin Wolf ha scritto: >>>>> * block-stream: I propose adding two options to the existing >>>>> block-stream command. If this is rejected, only mirroring will be able >>>>> to use rerror/werror. >>>>> >>>>> The new options are of course rerror/werror. They are enum options, >>>>> with the following possible values: >>>> >>>> Do we really need separate werror/rerror? For guest operations they >>>> really exist only for historical reasons: werror was there first, and >>>> when we wanted the same functionality, it seemed odd to overload werror >>>> to include reads as well. >>>> >>>> For block jobs, where there is no such option yet, we could go with a >>>> single error option, unless there is a use case for separate >>>> werror/rerror options. >>> >>> For mirroring rerror=source and werror=target. I'm not sure there is an >>> actual usecase, but at least it is more interesting than for devices... >> >> Hm. What if we add an active mirror? Then we can get some kind of COW, >> and rerror can happen on the target as well. > > Errors during the read part of COW are always reported as werror.
Good point. Thinking a bit more about it, with an active mirror (i.e. a filter block driver) things become a bit less clear anyway. The filter would have to be linked to the job somehow. Another interesting question is if we'll want to restrict ourselves to one job at a time forever. But when we stop doing it, we'll need new APIs anyway. >> If source/target is really the distinction we want to have, should the >> available options be specific to the job type, so that you could have >> src_error and dst_error for mirroring? > > Yes, that would make sense. Of course, at the same time it also makes the implementation a bit more complicated. >>>>> 'stop': The VM *and* the job will be paused---the VM is stopped even if >>>>> the block device has neither rerror=stop nor werror={stop,enospc}. The >>>>> error is recorded in the block device's iostatus (which can be examined >>>>> with query-block). However, a BLOCK_IO_ERROR event will _never_ pause a >>>>> job. >>>>> >>>>> Rationale: stopping all I/O seems to be the best choice in order >>>>> to limit the number of errors received. However, due to backwards- >>>>> compatibility with QEMU 1.1 we cannot pause the job when guest- >>>>> initiated I/O causes an error. We could do that if the block >>>>> device has rerror=stop/werror={stop,enospc}, but it seems more >>>>> complicated to just never do it. >>>> >>>> I don't agree with stopping the VM. Consider a case where the target is >>>> somewhere on the network and you lose the connection, but the primary >>>> image is local on the hard disk. You don't want to stop the VM just >>>> because continuing with the copy isn't possible for the moment. >>> >>> I think this is something that management should resolve. >> >> Management doesn't necessarily exist. > > Even a human sitting at a console is management. (Though I don't plan > HMP to expose rerror/werror; so you can assume in some sense that > management exists). But it's management that cares about good defaults. :-) Why not expose the options in HMP? >>> For an error on the source, stopping the VM makes sense. I don't >>> think management cares about what caused an I/O error on a device. >>> Does it matter if streaming was active or rather the guest was >>> executing "dd if=/dev/sda of=/dev/null". >> >> Yes, there's a big difference: If it was a job, the guest can keep >> running without any problems. If it was a guest operation, we would have >> to return an error to the guest, which may offline the disk in response. > > Ok, this makes sense. > >>> Management may want to keep the VM stopped even for an error on the >>> target, as long as mirroring has finished the initial synchronization >>> step. The VM can perform large amounts of I/O while the job is paused, >>> and then completing the job can take a large amount of time. >> >> If management wants to limit the impact of this, it can decide to >> explicitly stop the VM when it receives the error event. > > That can be too late. > > Eric, is it a problem for libvirt if a pause or target error during > mirroring causes the job to exit steady state? That means that after a > target error the offset can go back from 100% to <100%. "too late" in what respect? With the passive mirror, we already have a window in which data is on the source, but not copied to the target. Does it make a big difference if it is a few bytes more or less? >>>> If the VM is stopped (including BLOCK_IO_ERROR), no I/O should be going >>>> on at all. Do we really keep running the jobs in 1.1? If so, this is a >>>> bug and should be fixed before the release. >>> >>> Yes, we do. Do you think it's a problem for migration (thinking more >>> about it: ouch, yes, it should be)? >> >> I'm pretty sure that it is a problem for migration. And it's likely a >> problem in more cases. > > On the other hand, in other cases it can be desirable (qemu -S, run > streaming before the VM starts). We would have to verify that the whole qemu code can deal with it. I'm pretty sure that today it can't and we had a related bug before, even though I can't remember the details. >>> I'd rather make the extension of query-block-jobs more generic, with a >>> list "devices" instead of a member "target", and making up the device >>> name in the implementation (so you have "device": "target" for mirroring). >> >> Well, my idea for blockdev was something like (represented in a -drive >> syntax because I don't know what it will look like): >> >> (qemu) blockdev_add file=foo.img,id=src >> (qemu) device_add virtio-blk-pci,drive=src >> ... >> (qemu) blockdev_add file=bar.img,id=dst >> (qemu) blockdev_mirror foo bar >> >> Once QOM reaches the block layer, I guess we'll want to make all >> BlockDriverStates user visible anyway. > > I don't disagree, but that's very different from what is done with > drive-mirror. Yes. Which isn't a problem per se because drive-mirror will be replaced by blockdev-mirror. However, things like query-block-jobs are probably going to stay, so they should be designed for the future. Things like this are why I don't feel overly comfortable with adding more and more block layer features before we implement -blockdev. > So for now I'll keep my proposed extension of query-block-jobs; later it > can be modified so that the target will have a name if you started the > mirroring with blockdev_mirror instead of drive_mirror. You mean the same QMP field is a string when the block device was added with blockdev_add and a dict when it was added with drive_add? Maintaining this sounds like a nightmare to me. Kevin