Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2

Kevin Wolf Mon, 21 May 2012 06:08:28 -0700

Am 21.05.2012 13:02, schrieb Paolo Bonzini:
> Il 21/05/2012 12:32, Kevin Wolf ha scritto:
>> Am 21.05.2012 12:02, schrieb Paolo Bonzini:
>>> Il 21/05/2012 11:29, Kevin Wolf ha scritto:
>>>>> * block-stream: I propose adding two options to the existing
>>>>> block-stream command.  If this is rejected, only mirroring will be able
>>>>> to use rerror/werror.
>>>>>
>>>>> The new options are of course rerror/werror.  They are enum options,
>>>>> with the following possible values:
>>>>
>>>> Do we really need separate werror/rerror? For guest operations they
>>>> really exist only for historical reasons: werror was there first, and
>>>> when we wanted the same functionality, it seemed odd to overload werror
>>>> to include reads as well.
>>>>
>>>> For block jobs, where there is no such option yet, we could go with a
>>>> single error option, unless there is a use case for separate
>>>> werror/rerror options.
>>>
>>> For mirroring rerror=source and werror=target.  I'm not sure there is an
>>> actual usecase, but at least it is more interesting than for devices...
>>
>> Hm. What if we add an active mirror? Then we can get some kind of COW,
>> and rerror can happen on the target as well.
> 
> Errors during the read part of COW are always reported as werror.


Good point.

Thinking a bit more about it, with an active mirror (i.e. a filter block
driver) things become a bit less clear anyway. The filter would have to
be linked to the job somehow.

Another interesting question is if we'll want to restrict ourselves to
one job at a time forever. But when we stop doing it, we'll need new
APIs anyway.

>> If source/target is really the distinction we want to have, should the
>> available options be specific to the job type, so that you could have
>> src_error and dst_error for mirroring?
> 
> Yes, that would make sense.

Of course, at the same time it also makes the implementation a bit more
complicated.

>>>>> 'stop': The VM *and* the job will be paused---the VM is stopped even if
>>>>> the block device has neither rerror=stop nor werror={stop,enospc}.  The
>>>>> error is recorded in the block device's iostatus (which can be examined
>>>>> with query-block).  However, a BLOCK_IO_ERROR event will _never_ pause a
>>>>> job.
>>>>>
>>>>>   Rationale: stopping all I/O seems to be the best choice in order
>>>>>   to limit the number of errors received.  However, due to backwards-
>>>>>   compatibility with QEMU 1.1 we cannot pause the job when guest-
>>>>>   initiated I/O causes an error.  We could do that if the block
>>>>>   device has rerror=stop/werror={stop,enospc}, but it seems more
>>>>>   complicated to just never do it.
>>>>
>>>> I don't agree with stopping the VM. Consider a case where the target is
>>>> somewhere on the network and you lose the connection, but the primary
>>>> image is local on the hard disk. You don't want to stop the VM just
>>>> because continuing with the copy isn't possible for the moment.
>>>
>>> I think this is something that management should resolve.
>>
>> Management doesn't necessarily exist.
> 
> Even a human sitting at a console is management.  (Though I don't plan
> HMP to expose rerror/werror; so you can assume in some sense that
> management exists).

But it's management that cares about good defaults. :-)

Why not expose the options in HMP?

>>> For an error on the source, stopping the VM makes sense.  I don't
>>> think management cares about what caused an I/O error on a device.
>>> Does it matter if streaming was active or rather the guest was
>>> executing "dd if=/dev/sda of=/dev/null".
>>
>> Yes, there's a big difference: If it was a job, the guest can keep
>> running without any problems. If it was a guest operation, we would have
>> to return an error to the guest, which may offline the disk in response.
> 
> Ok, this makes sense.
> 
>>> Management may want to keep the VM stopped even for an error on the
>>> target, as long as mirroring has finished the initial synchronization
>>> step.  The VM can perform large amounts of I/O while the job is paused,
>>> and then completing the job can take a large amount of time.
>>
>> If management wants to limit the impact of this, it can decide to
>> explicitly stop the VM when it receives the error event.
> 
> That can be too late.
> 
> Eric, is it a problem for libvirt if a pause or target error during
> mirroring causes the job to exit steady state?  That means that after a
> target error the offset can go back from 100% to <100%.

"too late" in what respect? With the passive mirror, we already have a
window in which data is on the source, but not copied to the target.
Does it make a big difference if it is a few bytes more or less?

>>>> If the VM is stopped (including BLOCK_IO_ERROR), no I/O should be going
>>>> on at all. Do we really keep running the jobs in 1.1? If so, this is a
>>>> bug and should be fixed before the release.
>>>
>>> Yes, we do.  Do you think it's a problem for migration (thinking more
>>> about it: ouch, yes, it should be)?
>>
>> I'm pretty sure that it is a problem for migration. And it's likely a
>> problem in more cases.
> 
> On the other hand, in other cases it can be desirable (qemu -S, run
> streaming before the VM starts).

We would have to verify that the whole qemu code can deal with it. I'm
pretty sure that today it can't and we had a related bug before, even
though I can't remember the details.

>>> I'd rather make the extension of query-block-jobs more generic, with a
>>> list "devices" instead of a member "target", and making up the device
>>> name in the implementation (so you have "device": "target" for mirroring).
>>
>> Well, my idea for blockdev was something like (represented in a -drive
>> syntax because I don't know what it will look like):
>>
>> (qemu) blockdev_add file=foo.img,id=src
>> (qemu) device_add virtio-blk-pci,drive=src
>> ...
>> (qemu) blockdev_add file=bar.img,id=dst
>> (qemu) blockdev_mirror foo bar
>>
>> Once QOM reaches the block layer, I guess we'll want to make all
>> BlockDriverStates user visible anyway.
> 
> I don't disagree, but that's very different from what is done with
> drive-mirror.

Yes. Which isn't a problem per se because drive-mirror will be replaced
by blockdev-mirror. However, things like query-block-jobs are probably
going to stay, so they should be designed for the future.

Things like this are why I don't feel overly comfortable with adding
more and more block layer features before we implement -blockdev.

> So for now I'll keep my proposed extension of query-block-jobs; later it
> can be modified so that the target will have a name if you started the
> mirroring with blockdev_mirror instead of drive_mirror.

You mean the same QMP field is a string when the block device was added
with blockdev_add and a dict when it was added with drive_add?
Maintaining this sounds like a nightmare to me.

Kevin

Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2

Reply via email to