On Fri, 23 Sep 2011 10:55:39 +0200 Markus Armbruster <arm...@redhat.com> wrote:
> Kevin Wolf <kw...@redhat.com> writes: > > > Am 19.09.2011 16:09, schrieb Luiz Capitulino: > >> On Mon, 19 Sep 2011 15:40:06 +0200 > >> Kevin Wolf <kw...@redhat.com> wrote: > >> > >>> Am 01.09.2011 20:37, schrieb Luiz Capitulino: > >>>> This series adds support to the block layer to keep track of devices' > >>>> I/O status. That information is also made available in QMP and HMP. > >>>> > >>>> The goal here is to allow management applications that miss the > >>>> BLOCK_IO_ERROR event to able to query the VM to determine if any device > >>>> has > >>>> caused the VM to stop and which device caused it. > >>>> > >>>> Please, note that this series depends on the following series: > >>>> > >>>> o [PATCH v3 0/8]: Introduce the RunState type > >>>> o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00118.html > >>>> > >>>> And to be able to properly test it you'll also need: > >>>> > >>>> o [PATCH 0/3] qcow2/coroutine fixes > >>>> o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00074.html > >>>> > >>>> Here's an HMP example: > >>>> > >>>> (qemu) info status > >>>> VM status: paused (io-error) > >>>> (qemu) info block > >>>> ide0-hd0: removable=0 io-status=ok file=disks/test2.img ro=0 drv=qcow2 > >>>> encrypted=0 > >>>> ide0-hd1: removable=0 io-status=nospace file=/dev/vg_doriath/kvmtest > >>>> ro=0 drv=qcow2 encrypted=0 > >>>> ide1-cd0: removable=1 locked=0 io-status=ok [not inserted] > >>>> floppy0: removable=1 locked=0 [not inserted] > >>>> sd0: removable=1 locked=0 [not inserted] > >>>> > >>>> The "info status" command shows that the VM is stopped due to an I/O > >>>> error. > >>>> By issuing "info block" it's possible to determine that the 'ide0-hd1' > >>>> device > >>>> caused the error, which turns out to be due to no space. > >>> > >>> Looks like I didn't reply here yet? > >> > >> No, you didn't. > >> > >>> I still don't quite like that the devices are involved, but their part > >>> is minimal and it makes the implementation much easier, so for me that's > >>> acceptable. > >> > >> Suggestions on better ways of implementing this are welcome! :) > > > > I don't have one. :-) > > > > Surely it would be possible to do everything in block.c, but I see that > > this would make things much more complicated (would need to get an AIO > > callback added to the chain that can save the error code). It's not > > worth the trouble. > > The block layer necessarily detects errors in a huge number of places. > > It also has a rich and complex interface to device models, with error > reporting in many places. > > virtio-blk, ide and scsi-disk each have a single error handler to handle > block layer errors. > > For better or worse, these are the only block layer error "chokepoints" > we have now, and therefore the only easy places to set BDS I/O status. > But they're still wrong. > > The block layer shouldn't require device models to tell it what it > already knows. > > Now, I don't want to block your series just because it adds this wart. > But the wart should be documented in the code. Is something like: /* XXX: this should be implemented in the block layer */ good enough? > Also document that any device model implementing werror=stop|enospc or > rerror=stop must update I/O status. Extra points if you find a way to > enforce it instead. Why? It's probably a nice thing because we can know which device caused the VM to stop, but I don't see it as a must have (it's probably a no brainer to update the I/O status though). > > Apropos enforcing. Currently, -drive accepts any werror and rerror > action with if={ide,virtio,scsi,none}. We rely on device models not > implementing an action to check and fail during initialization. > scsi-generic and the floppy devices do. All the other qdevified devices > don't, and that's broken. I think I can fix that, but this series doesn't depend on that, right? (in the meaning that we could merge this before getting that problem fixed). > > Are werror and rerror really host state? >