O status support

Luiz Capitulino Fri, 23 Sep 2011 06:49:16 -0700

On Fri, 23 Sep 2011 10:55:39 +0200
Markus Armbruster <arm...@redhat.com> wrote:


> Kevin Wolf <kw...@redhat.com> writes:
> 
> > Am 19.09.2011 16:09, schrieb Luiz Capitulino:
> >> On Mon, 19 Sep 2011 15:40:06 +0200
> >> Kevin Wolf <kw...@redhat.com> wrote:
> >> 
> >>> Am 01.09.2011 20:37, schrieb Luiz Capitulino:
> >>>> This series adds support to the block layer to keep track of devices'
> >>>> I/O status. That information is also made available in QMP and HMP.
> >>>>
> >>>> The goal here is to allow management applications that miss the
> >>>> BLOCK_IO_ERROR event to able to query the VM to determine if any device 
> >>>> has
> >>>> caused the VM to stop and which device caused it.
> >>>>
> >>>> Please, note that this series depends on the following series:
> >>>>
> >>>>  o [PATCH v3 0/8]: Introduce the RunState type
> >>>>  o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00118.html
> >>>>
> >>>> And to be able to properly test it you'll also need:
> >>>>
> >>>>  o [PATCH 0/3] qcow2/coroutine fixes
> >>>>  o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00074.html
> >>>>
> >>>> Here's an HMP example:
> >>>>
> >>>>   (qemu) info status 
> >>>>   VM status: paused (io-error)
> >>>>   (qemu) info block
> >>>>   ide0-hd0: removable=0 io-status=ok file=disks/test2.img ro=0 drv=qcow2 
> >>>> encrypted=0
> >>>>   ide0-hd1: removable=0 io-status=nospace file=/dev/vg_doriath/kvmtest 
> >>>> ro=0 drv=qcow2 encrypted=0
> >>>>   ide1-cd0: removable=1 locked=0 io-status=ok [not inserted]
> >>>>   floppy0: removable=1 locked=0 [not inserted]
> >>>>   sd0: removable=1 locked=0 [not inserted]
> >>>>
> >>>> The "info status" command shows that the VM is stopped due to an I/O 
> >>>> error.
> >>>> By issuing "info block" it's possible to determine that the 'ide0-hd1' 
> >>>> device
> >>>> caused the error, which turns out to be due to no space.
> >>>
> >>> Looks like I didn't reply here yet?
> >> 
> >> No, you didn't.
> >> 
> >>> I still don't quite like that the devices are involved, but their part
> >>> is minimal and it makes the implementation much easier, so for me that's
> >>> acceptable.
> >> 
> >> Suggestions on better ways of implementing this are welcome! :)
> >
> > I don't have one. :-)
> >
> > Surely it would be possible to do everything in block.c, but I see that
> > this would make things much more complicated (would need to get an AIO
> > callback added to the chain that can save the error code). It's not
> > worth the trouble.
> 
> The block layer necessarily detects errors in a huge number of places.
> 
> It also has a rich and complex interface to device models, with error
> reporting in many places.
> 
> virtio-blk, ide and scsi-disk each have a single error handler to handle
> block layer errors.
> 
> For better or worse, these are the only block layer error "chokepoints"
> we have now, and therefore the only easy places to set BDS I/O status.
> But they're still wrong.
> 
> The block layer shouldn't require device models to tell it what it
> already knows.
> 
> Now, I don't want to block your series just because it adds this wart.
> But the wart should be documented in the code.

Is something like:

 /* XXX: this should be implemented in the block layer */

good enough?

> Also document that any device model implementing werror=stop|enospc or
> rerror=stop must update I/O status.  Extra points if you find a way to
> enforce it instead.

Why? It's probably a nice thing because we can know which device caused
the VM to stop, but I don't see it as a must have (it's probably a
no brainer to update the I/O status though).

> 
> Apropos enforcing.  Currently, -drive accepts any werror and rerror
> action with if={ide,virtio,scsi,none}.  We rely on device models not
> implementing an action to check and fail during initialization.
> scsi-generic and the floppy devices do.  All the other qdevified devices
> don't, and that's broken.

I think I can fix that, but this series doesn't depend on that, right?
(in the meaning that we could merge this before getting that problem fixed).

> 
> Are werror and rerror really host state?
>

Re: [Qemu-devel] [PATCH v1 0/6]: block: Add I/O status support

Reply via email to