I'm running libvirtd under Debian 12 and trying to set up live migration
of a linux vm that's using an sr-iov VF as its primary ethernet device.
I have that device and the corresponding virtio backup device properly
configured in libvirt, and when the vm starts up everything looks good:
2: nic0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:a1:e0:38 brd ff:ff:ff:ff:ff:ff
3: enp8s0nsby: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
fq_codel master nic0 state DOWN mode DEFAULT group default qlen 1000
link/ether 52:54:00:a1:e0:38 brd ff:ff:ff:ff:ff:ff
5: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
nic0 state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:a1:e0:38 brd ff:ff:ff:ff:ff:ff
The problem I am having is that when I do a live migration of the box,
link on the standby virtio interface does not come up when the VF is
unplugged, so all network traffic to the system is dropped during the
interval between the source pulling the VF and the destination plugging
it back in.
It's not clear to me who is responsible for doing that? But from what I
can tell, it seems like it should be qmeu?
Per the documentation:
https://www.kernel.org/doc/html/latest/networking/net_failover.html
The sequence of events should be:
* bring up link on standby device
* detach sr-iov device
* migrate vm
* attach sr-iov device
* bring down link on standby device
If I do that manually, using virsh to light up the standby device and
detach the VF before migration and then manually reattach the device and
bring down standby link, no traffic is lost at all during the migration
process.
I initially thought perhaps libvirt was supposed to be doing it, but
reviewing the debug logs and the QMP commands, it is neither detaching
nor reattaching the VF. It's just telling qmeu there's a failover pair,
and qemu is doing the detach/attach while migrating:
-device
{"driver":"virtio-net-pci","failover":true,"netdev":"hostua-sr-iov-backup","id":"ua-sr-iov-backup","mac":"52:54:00:a1:e0:38","bus":"pci.7","addr":"0x0"}
-device
{"driver":"vfio-pci","host":"0000:37:10.0","id":"hostdev0","failover_pair_id":"ua-sr-iov-backup","bus":"pci.1","addr":"0x0"}
I tried manually bringing up link on the standby device before migrating
the system and letting qemu deal with the vf detach/attach, but that
resulted in even more lost traffic than simply letting the standby
device be down during the migration (the network started sending packets
to the standby device but the failover device didn't forward them on to
the virtual nic as long as the VF existed).
Am I missing something? Is there some other configuration I'm supposed
to do? Any insight on this issue would be most appreciated.
Debian 12 has qemu 7.2 as stable, but I also tried the backport of 9.2
with the same apparent behavior.
Thanks much...