On Tue, Apr 4, 2017 at 4:28 PM, Chris Friesen <chris.frie...@windriver.com> wrote: > On 04/04/2017 07:56 AM, Ladi Prosek wrote: >> >> On Mon, Apr 3, 2017 at 9:11 PM, Stefan Hajnoczi <stefa...@gmail.com> >> wrote: >>> >>> On Fri, Mar 31, 2017 at 02:12:36PM -0600, Chris Friesen wrote: > > >>>> Initially we have a bunch of guests running on compute-2 (which is >>>> running >>>> qemu-kvm-ev 2.3.0). We then started live-migrating them one at a time >>>> to >>>> compute-0 (which is running qemu-kvm-ev 2.6.0). Three of them migrated >>>> successfully. The fourth (which was essentially identical in >>>> configuration >>>> to the first three) failed, as per the following logs in >>>> /var/log/libvirt/qemu/instance-0000000e.log: >>>> >>>> >>>> 2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx >>>> 0x47b >>>> - used_idx 0x47c >>>> 2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for >>>> instance >>>> 0x0 of device '0000:00:07.0/virtio-balloon' >>>> 2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: >>>> Operation >>>> not permitted >>>> 2017-03-29 06:38:37.896+0000: shutting down >>>> >>>> >>>> Does anyone know of an existing bug report covering this issue? (I took >>>> a >>>> look and didn't see anything obviously related.) >>> >>> >>> This is the virtio-balloon device. If you remove the device the live >>> migration should work reliably. >>> >>> Alternatively, you can temporarily rmmod virtio_balloon inside the guest >>> for live migration. After migration you can modprobe virtio_balloon >>> again. >>> >>> last_avail_idx 0x47b with used_idx 0x47c is an invalid device state. >>> I've diffed qemu-kvm-ev 2.6.0-27.1 hw/virtio/virtio-balloon.c against >>> qemu.git/master and do not see an obvious bug. I also compared >>> qemu-kvm-ev 2.3.0-31 with qemu-kvm-ev 2.6.0-27.1. >> >> >> The device likely got into the invalid state as part of a previous >> migration to an unfixed QEMU. I second Stefan's suggestion to >> temporarily remove the device or unload the driver. > > > I'll give that a try (been busy with a separate issue). > > If I have a guest already running, can I unilaterally hot-remove the device > from the host side or does the guest need to be involved as well? (I'm just > trying to figure out how to deal with existing guests.)
Hot-remove should be fine. > Thanks, > Chris