I'm running libvirtd under Debian 12 and trying to set up live migration of a linux vm that's using an sr-iov VF as its primary ethernet device. I have that device and the corresponding virtio backup device properly configured in libvirt, and when the vm starts up everything looks good:

2: nic0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:a1:e0:38 brd ff:ff:ff:ff:ff:ff
3: enp8s0nsby: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel master nic0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:a1:e0:38 brd ff:ff:ff:ff:ff:ff
5: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master nic0 state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:a1:e0:38 brd ff:ff:ff:ff:ff:ff

The problem I am having is that when I do a live migration of the box, link on the standby virtio interface does not come up when the VF is unplugged, so all network traffic to the system is dropped during the interval between the source pulling the VF and the destination plugging it back in.

It's not clear to me who is responsible for doing that? But from what I can tell, it seems like it should be qmeu?

Per the documentation:

https://www.kernel.org/doc/html/latest/networking/net_failover.html

The sequence of events should be:

* bring up link on standby device
* detach sr-iov device
* migrate vm
* attach sr-iov device
* bring down link on standby device

If I do that manually, using virsh to light up the standby device and detach the VF before migration and then manually reattach the device and bring down standby link, no traffic is lost at all during the migration process.

I initially thought perhaps libvirt was supposed to be doing it, but reviewing the debug logs and the QMP commands, it is neither detaching nor reattaching the VF. It's just telling qmeu there's a failover pair, and qemu is doing the detach/attach while migrating:

-device {"driver":"virtio-net-pci","failover":true,"netdev":"hostua-sr-iov-backup","id":"ua-sr-iov-backup","mac":"52:54:00:a1:e0:38","bus":"pci.7","addr":"0x0"}

-device {"driver":"vfio-pci","host":"0000:37:10.0","id":"hostdev0","failover_pair_id":"ua-sr-iov-backup","bus":"pci.1","addr":"0x0"}


I tried manually bringing up link on the standby device before migrating the system and letting qemu deal with the vf detach/attach, but that resulted in even more lost traffic than simply letting the standby device be down during the migration (the network started sending packets to the standby device but the failover device didn't forward them on to the virtual nic as long as the VF existed).

Am I missing something? Is there some other configuration I'm supposed to do? Any insight on this issue would be most appreciated.

Debian 12 has qemu 7.2 as stable, but I also tried the backport of 9.2 with the same apparent behavior.

Thanks much...


Reply via email to