First, let me briefly outline the way we use live migration, as it is probably not typical. We use live migration (with block migration) to make backups of VMs with zero downtime. The basic process goes like this:
1) migrate src VM -> dest VM 2) migration completes 3) cont src VM 4) gracefully shut down dest VM 5) dest VM's disk image is now a valid backup In general, this works very well. Up until now we have been using qemu-kvm 1.1.2 and have not had any issues with the above process. I am now attempting to upgrade us to a newer version of qemu, but all newer versions I've tried occasionally result in the virtio- net device ceasing to function on the src VM after step 3. I am able to reproduce this reliably (given enough iterations), it happens in roughly 2% of all migrations. Here is the complete qemu command line for the src VM: /usr/bin/qemu-system-x86_64 -machine accel=kvm -drive file=/var/lib/kvm/testbackup.polldev.com.img,if=virtio -m 2048 -smp 4,cores=4,sockets=1,threads=1 -net nic,macaddr=52:54:98:00:00:00,model=virtio -net tap,script=/etc/qemu-ifup- br2,downscript=no -curses -name "testbackup.polldev.com",process=testbackup.polldev.com -monitor unix:/var/lib/kvm/monitor/testbackup,server,nowait The dest VM: /usr/bin/qemu-system-x86_64 -machine accel=kvm -drive file=/backup/testbackup.polldev.com.img.bak20140129,if=virtio -m 2048 -smp 4,cores=4,sockets=1,threads=1 -net nic,macaddr=52:54:98:00:00:00,model=virtio -net tap,script=no,downscript=no - curses -name "testbackup.polldev.com",process=testbackup.polldev.com -monitor unix:/var/lib/kvm/monitor/testbackup.bak,server,nowait -incoming tcp:0:4444 The migration is performed like so: echo "migrate -b tcp:localhost:4444" | socat STDIO UNIX- CONNECT:/var/lib/kvm/monitor/testbackup echo "migrate_set_speed 1G" | socat STDIO UNIX- CONNECT:/var/lib/kvm/monitor/testbackup #wait echo cont | socat STDIO UNIX-CONNECT:/var/lib/kvm/monitor/testbackup The guest in question is a minimal install of CentOS 6.5. I have observed this issue across the following qemu versions: qemu 1.4.2 qemu 1.6.0 qemu 1.6.1 qemu 1.7.0 I also attempted to test qemu 1.5.3, but live migration flat out crashed there (totally different issue). I have also tested a number of other scenarios with qemu 1.6.0, all of which exhibit the same failure mode: qemu 1.6.0 + host kernel 3.1.0 qemu 1.6.0 + host kernel 3.10.7 qemu 1.6.0 + host kernel 3.10.17 qemu 1.6.0 + virtio with -netdev/-device syntax qemu 1.6.0 + accel=tcg The one case I have found that works properly is the following: qemu 1.6.0 + e1000 It is worth noting that when the virtio-net device ceases to function in the guest that removing and reinserting the virtio-net kernel module results in the device working again (except in 1.4.2, this had no effect there). As mentioned above I can reproduce this with minimal effort, and am willing to test out any patches or provide further details as necessary. - Neil