On Tue, Jun 04, 2013 at 06:37:27PM +0200, Markus Armbruster wrote: > Stefan Hajnoczi <stefa...@redhat.com> writes: > > > Paolo Bonzini <pbonz...@redhat.com> suggested the following test case: > > > > 1. Launch a guest and wait at the GRUB boot menu: > > > > qemu-system-x86_64 -enable-kvm -m 1024 \ > > -drive if=none,cache=none,file=test.img,id=foo,werror=stop,rerror=stop > > -device virtio-blk-pci,drive=foo,id=virtio0,addr=4 > > > > 2. Hot unplug the device: > > > > (qemu) drive_del foo > > > > 3. Select the first boot menu entry > > > > Without this patch the guest pauses due to ENOMEDIUM. But it is not > > possible to resolve this situation - the drive has become anonymous. > > > > With this patch the guest the guest gets the ENOMEDIUM error. > > > > Note that this scenario actually happens sometimes during libvirt disk > > hot unplug, where device_del is followed by drive_del. I/O may still be > > submitted to the drive after drive_del if the guest does not process the > > PCI hot unplug notification. > > > > Reported-by: Dafna Ron <d...@redhat.com> > > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > > --- > > blockdev.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/blockdev.c b/blockdev.c > > index d1ec99a..6eb81a3 100644 > > --- a/blockdev.c > > +++ b/blockdev.c > > @@ -1180,6 +1180,10 @@ int do_drive_del(Monitor *mon, const QDict *qdict, > > QObject **ret_data) > > */ > > if (bdrv_get_attached_dev(bs)) { > > bdrv_make_anon(bs); > > + > > + /* Further I/O must not pause the guest */ > > + bdrv_set_on_error(bs, BLOCKDEV_ON_ERROR_REPORT, > > + BLOCKDEV_ON_ERROR_REPORT); > > } else { > > drive_uninit(drive_get_by_blockdev(bs)); > > } > > The user gets exactly what he ordered. He ordered "stop on error", then > provoked errors by turning the virtual block device into a virtual pile > of scrap metal. Because that's exactly what drive_del does when used > while a device model is attached to the drive. > > The only sane use case for drive_del I can think of is revoking access > to an image violently, after the guest failed to honor a hot unplug. > > Even then, using drive_del when the block device is removable is > unnecessary. Just rip out the medium with eject -f. Look ma, no scrap > metal. > > I'm not sure what you mean by "it is not possible to resolve this > situation". The device is shot! Can't see how that could be resolved.
This is the critical part: the guest is paused and there is no way to resolve the continuous pause loop. The drive is gone but the guest hasn't PCI hot unplugged the storage controller. As a user, there's nothing you can do on the QEMU monitor to resume the guest - it will just pause itself again. This behavior is really bad, QEMU has basically wedged the guest into an unrecoverable state and that's what I was trying to describe. > I figure the bit that can't be resolved now is letting the user switch > off "stop on error" safely before a drive_del. Even if we had a command > for that, there'd still be a window between that command's execution and > drive_del's. Your patch solves the problem by having drive_del switch > it off unconditionally. Oookay, but please document it, because it's > not exactly obvious. Thanks for the documentation suggestion, will add it in v2. > Re "the guest gets the ENOMEDIUM error": depends on the device. I doubt > disks can signal "no medium", and even if they could, I doubt device > drivers are prepared for it. Yep, error reporting depends on the emulated storage controller. virtio-blk and IDE just report a generic error status. > Re "this scenario actually happens sometimes during libvirt disk hot > unplug, where device_del is followed by drive_del": if I remember > correctly, libvirt disk hot unplug runs drive_del right after > device_del, opening a window where the guest sees a dead device. That's > asking for trouble, and trouble is known to oblige. Agreed.