O status support

Markus Armbruster Fri, 23 Sep 2011 07:05:28 -0700

Luiz Capitulino <lcapitul...@redhat.com> writes:

> On Fri, 23 Sep 2011 10:55:39 +0200
> Markus Armbruster <arm...@redhat.com> wrote:
>
>> Kevin Wolf <kw...@redhat.com> writes:
>> 
>> > Am 19.09.2011 16:09, schrieb Luiz Capitulino:
>> >> On Mon, 19 Sep 2011 15:40:06 +0200
>> >> Kevin Wolf <kw...@redhat.com> wrote:
>> >> 
>> >>> Am 01.09.2011 20:37, schrieb Luiz Capitulino:
>> >>>> This series adds support to the block layer to keep track of devices'
>> >>>> I/O status. That information is also made available in QMP and HMP.
>> >>>>
>> >>>> The goal here is to allow management applications that miss the
>> >>>> BLOCK_IO_ERROR event to able to query the VM to determine if any device 
>> >>>> has
>> >>>> caused the VM to stop and which device caused it.
>> >>>>
>> >>>> Please, note that this series depends on the following series:
>> >>>>
>> >>>>  o [PATCH v3 0/8]: Introduce the RunState type
>> >>>>  o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00118.html
>> >>>>
>> >>>> And to be able to properly test it you'll also need:
>> >>>>
>> >>>>  o [PATCH 0/3] qcow2/coroutine fixes
>> >>>>  o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00074.html
>> >>>>
>> >>>> Here's an HMP example:
>> >>>>
>> >>>>   (qemu) info status 
>> >>>>   VM status: paused (io-error)
>> >>>>   (qemu) info block
>> >>>>   ide0-hd0: removable=0 io-status=ok file=disks/test2.img ro=0 
>> >>>> drv=qcow2 encrypted=0
>> >>>>   ide0-hd1: removable=0 io-status=nospace file=/dev/vg_doriath/kvmtest 
>> >>>> ro=0 drv=qcow2 encrypted=0
>> >>>>   ide1-cd0: removable=1 locked=0 io-status=ok [not inserted]
>> >>>>   floppy0: removable=1 locked=0 [not inserted]
>> >>>>   sd0: removable=1 locked=0 [not inserted]
>> >>>>
>> >>>> The "info status" command shows that the VM is stopped due to an I/O 
>> >>>> error.
>> >>>> By issuing "info block" it's possible to determine that the 'ide0-hd1' 
>> >>>> device
>> >>>> caused the error, which turns out to be due to no space.
>> >>>
>> >>> Looks like I didn't reply here yet?
>> >> 
>> >> No, you didn't.
>> >> 
>> >>> I still don't quite like that the devices are involved, but their part
>> >>> is minimal and it makes the implementation much easier, so for me that's
>> >>> acceptable.
>> >> 
>> >> Suggestions on better ways of implementing this are welcome! :)
>> >
>> > I don't have one. :-)
>> >
>> > Surely it would be possible to do everything in block.c, but I see that
>> > this would make things much more complicated (would need to get an AIO
>> > callback added to the chain that can save the error code). It's not
>> > worth the trouble.
>> 
>> The block layer necessarily detects errors in a huge number of places.
>> 
>> It also has a rich and complex interface to device models, with error
>> reporting in many places.
>> 
>> virtio-blk, ide and scsi-disk each have a single error handler to handle
>> block layer errors.
>> 
>> For better or worse, these are the only block layer error "chokepoints"
>> we have now, and therefore the only easy places to set BDS I/O status.
>> But they're still wrong.
>> 
>> The block layer shouldn't require device models to tell it what it
>> already knows.
>> 
>> Now, I don't want to block your series just because it adds this wart.
>> But the wart should be documented in the code.
>
> Is something like:
>
>  /* XXX: this should be implemented in the block layer */
>
> good enough?


Add a brief "why", and it's perfect.

>> Also document that any device model implementing werror=stop|enospc or
>> rerror=stop must update I/O status.  Extra points if you find a way to
>> enforce it instead.
>
> Why? It's probably a nice thing because we can know which device caused
> the VM to stop, but I don't see it as a must have (it's probably a
> no brainer to update the I/O status though).

I misread and thought it would lie in that case, but it's actually
silent then (bdrv_iostatus_is_enabled() returns false).  Thus, it's not
a must.

I'd still be tempted to make it one, so clients can rely on io-status
after error stop.  Can't see why anyone would want to do stop on error
in a device model without also setting I/O status.  But use your own
judgement.

>> Apropos enforcing.  Currently, -drive accepts any werror and rerror
>> action with if={ide,virtio,scsi,none}.  We rely on device models not
>> implementing an action to check and fail during initialization.
>> scsi-generic and the floppy devices do.  All the other qdevified devices
>> don't, and that's broken.
>
> I think I can fix that, but this series doesn't depend on that, right?
> (in the meaning that we could merge this before getting that problem fixed).

Right.

[...]

Re: [Qemu-devel] [PATCH v1 0/6]: block: Add I/O status support

Reply via email to