On 2020/12/2 6:21, Alex Williamson wrote: > On Tue, 1 Dec 2020 14:37:52 +0800 > Shenming Lu <lushenm...@huawei.com> wrote: > >> On 2020/12/1 1:03, Alex Williamson wrote: >>> On Thu, 26 Nov 2020 14:56:17 +0800 >>> Shenming Lu <lushenm...@huawei.com> wrote: >>> >>>> Hi, >>>> >>>> After reading everyone's opinions, we have a rough idea for this issue. >>>> >>>> One key point is whether it is necessary to setup the config space before >>>> the device can accept further migration data. I think it is decided by >>>> the vendor driver, so we can simply ask the vendor driver about it in >>>> .save_setup, which could avoid a lot of unnecessary copies and settings. >>>> Once we have known the need, we can iterate the config space (before) >>>> along with the device migration data in .save_live_iterate and >>>> .save_live_complete_precopy, and if not needed, we can only migrate the >>>> config space in .save_state. >>>> >>>> Another key point is that the interrupt enabling should be after the >>>> restoring of the interrupt controller (might not only interrupts). >>>> My solution is to add a subflag at the beginning of the config data >>>> (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered >>>> actions on the dst (such as whether to enable interrupts). >>>> >>>> Below is it's workflow. >>>> >>>> On the save path: >>>> In vfio_save_setup(): >>>> Ask the vendor driver if it needs the config space setup before it >>>> can accept further migration data. >>> >>> How does "ask the vendor driver" actually work? >> >> Maybe via a ioctl? >> Oh, it seems that we have to ask the dst vendor driver (in vfio_load_setup) >> and send the config data (before) along with the device migration data >> every time?... > > Migration data streams are unidirectional, otherwise we break offline > migration. There are various ways other than an ioctl for a vendor > driver to advertise requirements, ex. a capability on the migration > region info, but it's not clear to me that we really need this info yet.
Ok, this is not clear yet. > >>>> | >>>> In vfio_save_iterate() (pre-copy): >>>> If *needed*, save the config space which would be setup on the dst >>>> before the migration data, but send with a subflag to instruct not >>>> to (such as) enable interrupts. >>> >>> If not for triggering things like MSI/X configuration, isn't config >>> space almost entirely virtual? What visibility does the vendor driver >>> have to the VM machine dependencies regarding device interrupt versus >>> interrupt controller migration? >> >> My thought is that the vendor driver only decides the order of the config >> space setup and the receiving of the migration data, but leaves the VM >> machine dependencies to QEMU. > > But again, config space is largely virtual, the vendor driver doesn't > have access to it, it's only the effects of config space, like > representing the interrupt mode and configuring it on the device or > specific registers that QEMU writes through that the vendor driver > sees. So how is it that the vendor driver decides the order? As you said above, a vendor driver could advertise its requirements via the migration region... > The > vendor driver doesn't have visibility to the VM machine dependencies, > like when the interrupt controller is sufficiently configured to enable > device interrupts. It seems that a vendor driver that depends on QEMU > enabling interrupts at a specific point is inherently dependent on > assumptions in the machine configuration. Yeah, there might be more than one dependency. > >>>> | >>>> In vfio_save_complete_precopy() (stop-and-copy, iterable process): >>>> The same as that in vfio_save_iterate(). >>>> | >>>> In .save_state (stop-and-copy, non-iterable process): >>>> If *needed*, only send a subflag to instruct to enable interrupts. >>>> If *not needed*, save the config space and setup everything on the dst. >>>> >>> >>> Again, how does the vendor driver have visibility to know when the VM >>> machine can enable interrupts? >> >> It seems troubling if the vendor driver needs the interrupts to be enabled >> first... > > Yes. > >>>> Besides the above idea, we might be able to choose to let the vendor >>>> driver do >>>> more: qemu just sends and writes the config data (before) along with the >>>> device >>>> migration data every time, and it's up to the vendor driver to filter >>>> out/buffer >>>> the received data or reorder the settings... >>> >>> There is no vendor driver in QEMU though, so are you suggesting that >>> QEMU follows a standard protocol and the vendor driver chooses when to >>> enable specific features? For instance, QEMU would call SET_IRQS and >>> the driver would return success, but defer that setup if necessary? >>> That seems quite troubling as we then have ioctls that behave >>> differently depending on the device state and we have no error path to >>> userspace should that setup fail later. The vendor driver does have >>> its own data stream for migration, so the vendor driver could tell the >>> destination version of itself what type of interrupt to use, which >>> might be sufficient if we were to ignore the latency if QEMU were to >>> defer interrupt setup until stop-and-copy. >> >> Did you mean that we could only enable MSI-X during the iterable phase, but >> leave the setup of these unmasked vectors to the non-iterable phase? >> It looks good to me. > > On x86 the vfio SET_IRQS ioctl is independent of the VM interrupt > setup, so we could decouple configuring the device interrupts from > plumbing them through the VM. On ARM64 the setting of the IRQ bypass in the vfio SET_IRQS ioctl is dependent on the VM interrupt setup... > However, doesn't ARM64 require mapping > an MSI doorbell page via vfio? Where does is that configured during the > machine load versus the device iterative phase? Isn't the mapping of the MSI doorbell arch independent in vfio? -_- > >>> Is the question of when to setup device interrupts versus the interrupt >>> controller state largely a machine issue within QEMU? If so, shouldn't >>> it be at QEMU's determination when to act on the config space >>> information on the target? >> >> I think it would be simpler if ensuring the proper calling order in QEMU... >> >>> IOW, if a vendor driver has a dependency on >>> interrupt configuration, they need to include it in their own pre-copy >>> data stream and decouple that dependency from userspace interrupt >>> configuration via the SET_IRQS ioctl. Is that possible? Thanks, >>> >> >> I don't understand what the decoupling that dependency from userspace >> interrupt configuration means... > > I mean that the vendor driver cannot depend on when QEMU calls > SET_IRQS, the vendor specific migration stream contains any information > the vendor driver needs about interrupt configuration such that QEMU > can call SET_IRQS whenever it chooses. Thanks, Oh, that sounds great. By this way, it seems that we only need to migrate the QEMU config space in the non-iterable process once. > > Alex > > . >