* Jens Freimann (jfreim...@redhat.com) wrote: > On Mon, Apr 08, 2019 at 10:16:50AM +0100, Dr. David Alan Gilbert wrote: > > * Michael S. Tsirkin (m...@redhat.com) wrote: > > > On Fri, Apr 05, 2019 at 09:56:29AM +0100, Dr. David Alan Gilbert wrote: > > > > * Jens Freimann (jfreim...@redhat.com) wrote: > > > > > On Fri, Mar 22, 2019 at 02:44:45PM +0100, Jens Freimann wrote: > > > > > > This is another attempt at implementing the host side of the > > > > > > net_failover concept > > > > > > (https://www.kernel.org/doc/html/latest/networking/net_failover.html) > > > > > > > > > > > > The general idea is that we have a pair of devices, a vfio-pci and a > > > > > > emulated device. Before migration the vfio device is unplugged and > > > > > > data > > > > > > flows to the emulated device, on the target side another vfio-pci > > > > > > device > > > > > > is plugged in to take over the data-path. In the guest the > > > > > > net_failover > > > > > > module will pair net devices with the same MAC address. > > > > > > > > > > > > * In the first patch the infrastructure for hiding the device is > > > > > > added > > > > > > for the qbus and qdev APIs. A "hidden" boolean is added to the > > > > > > device > > > > > > state and it is set based on a callback to the standby device which > > > > > > registers itself for handling the assessment: "should the primary > > > > > > device > > > > > > be hidden?" by cross validating the ids of the devices. > > > > > > > > > > > > * In the second patch the virtio-net uses the API to hide the vfio > > > > > > device and unhides it when the feature is acked. > > > > > > > > > > > > Previous discussion: https://patchwork.ozlabs.org/cover/989098/ > > > > > > > > > > > > To summarize concerns/feedback from previous discussion: > > > > > > 1.- guest OS can reject or worse _delay_ unplug by any amount of > > > > > > time. > > > > > > Migration might get stuck for unpredictable time with unclear > > > > > > reason. > > > > > > This approach combines two tricky things, hot/unplug and migration. > > > > > > -> We can surprise-remove the PCI device and in QEMU we can do all > > > > > > necessary rollbacks transparent to management software. Will it > > > > > > be > > > > > > easy, probably not. > > > > > > > > This sounds 'fun' - bonus cases are things like what happens if the > > > > guest gets rebooted somewhere during the process or if it's currently > > > > sitting in the bios/grub/etc > > > > > > Um, during which process? Guests are gradually fixed to support > > > surprise removal well. Part of it is thunderbolt which makes > > > it incredibly easy. Yes - bios/grub will need to learn to > > > handle this well. > > > > Ignoring the actual mechanism of the unplug itself; there are probably > > loads of cases; e.g. > > > > running with both cards > > hot unplug real card > > start migration > > guest reboots > > Kernel sees only the virtio card > > migration completes > > hotadd the real card back > > > > so the guest has to know to pair the real card even though it booted > > with only the virtio card. > > Maybe I misunderstand, but, when the 'real card' is added back after > migration the net_failover driver in the guest will know to pair it > with the virtio card because they have the same MAC address. Did you > mean something else?
OK if it knows to do that. > > I'm sure there are loads of other corners. > > Probably yes. Yeh, that was just my worry - just there's loads of this type of corner around reboots. Dave > regards, > Jens -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK