Luiz Capitulino <lcapitul...@redhat.com> writes: > On Fri, 23 Sep 2011 10:55:39 +0200 > Markus Armbruster <arm...@redhat.com> wrote: > >> Kevin Wolf <kw...@redhat.com> writes: >> >> > Am 19.09.2011 16:09, schrieb Luiz Capitulino: >> >> On Mon, 19 Sep 2011 15:40:06 +0200 >> >> Kevin Wolf <kw...@redhat.com> wrote: >> >> >> >>> Am 01.09.2011 20:37, schrieb Luiz Capitulino: >> >>>> This series adds support to the block layer to keep track of devices' >> >>>> I/O status. That information is also made available in QMP and HMP. >> >>>> >> >>>> The goal here is to allow management applications that miss the >> >>>> BLOCK_IO_ERROR event to able to query the VM to determine if any device >> >>>> has >> >>>> caused the VM to stop and which device caused it. >> >>>> >> >>>> Please, note that this series depends on the following series: >> >>>> >> >>>> o [PATCH v3 0/8]: Introduce the RunState type >> >>>> o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00118.html >> >>>> >> >>>> And to be able to properly test it you'll also need: >> >>>> >> >>>> o [PATCH 0/3] qcow2/coroutine fixes >> >>>> o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00074.html >> >>>> >> >>>> Here's an HMP example: >> >>>> >> >>>> (qemu) info status >> >>>> VM status: paused (io-error) >> >>>> (qemu) info block >> >>>> ide0-hd0: removable=0 io-status=ok file=disks/test2.img ro=0 >> >>>> drv=qcow2 encrypted=0 >> >>>> ide0-hd1: removable=0 io-status=nospace file=/dev/vg_doriath/kvmtest >> >>>> ro=0 drv=qcow2 encrypted=0 >> >>>> ide1-cd0: removable=1 locked=0 io-status=ok [not inserted] >> >>>> floppy0: removable=1 locked=0 [not inserted] >> >>>> sd0: removable=1 locked=0 [not inserted] >> >>>> >> >>>> The "info status" command shows that the VM is stopped due to an I/O >> >>>> error. >> >>>> By issuing "info block" it's possible to determine that the 'ide0-hd1' >> >>>> device >> >>>> caused the error, which turns out to be due to no space. >> >>> >> >>> Looks like I didn't reply here yet? >> >> >> >> No, you didn't. >> >> >> >>> I still don't quite like that the devices are involved, but their part >> >>> is minimal and it makes the implementation much easier, so for me that's >> >>> acceptable. >> >> >> >> Suggestions on better ways of implementing this are welcome! :) >> > >> > I don't have one. :-) >> > >> > Surely it would be possible to do everything in block.c, but I see that >> > this would make things much more complicated (would need to get an AIO >> > callback added to the chain that can save the error code). It's not >> > worth the trouble. >> >> The block layer necessarily detects errors in a huge number of places. >> >> It also has a rich and complex interface to device models, with error >> reporting in many places. >> >> virtio-blk, ide and scsi-disk each have a single error handler to handle >> block layer errors. >> >> For better or worse, these are the only block layer error "chokepoints" >> we have now, and therefore the only easy places to set BDS I/O status. >> But they're still wrong. >> >> The block layer shouldn't require device models to tell it what it >> already knows. >> >> Now, I don't want to block your series just because it adds this wart. >> But the wart should be documented in the code. > > Is something like: > > /* XXX: this should be implemented in the block layer */ > > good enough?
Add a brief "why", and it's perfect. >> Also document that any device model implementing werror=stop|enospc or >> rerror=stop must update I/O status. Extra points if you find a way to >> enforce it instead. > > Why? It's probably a nice thing because we can know which device caused > the VM to stop, but I don't see it as a must have (it's probably a > no brainer to update the I/O status though). I misread and thought it would lie in that case, but it's actually silent then (bdrv_iostatus_is_enabled() returns false). Thus, it's not a must. I'd still be tempted to make it one, so clients can rely on io-status after error stop. Can't see why anyone would want to do stop on error in a device model without also setting I/O status. But use your own judgement. >> Apropos enforcing. Currently, -drive accepts any werror and rerror >> action with if={ide,virtio,scsi,none}. We rely on device models not >> implementing an action to check and fail during initialization. >> scsi-generic and the floppy devices do. All the other qdevified devices >> don't, and that's broken. > > I think I can fix that, but this series doesn't depend on that, right? > (in the meaning that we could merge this before getting that problem fixed). Right. [...]