On 2020/11/24 3:33, Neo Jia wrote: > On Mon, Nov 23, 2020 at 11:14:38AM +0800, Shenming Lu wrote: >> External email: Use caution opening links or attachments >> >> >> On 2020/11/21 6:01, Alex Williamson wrote: >>> On Fri, 20 Nov 2020 22:05:49 +0800 >>> Shenming Lu <lushenm...@huawei.com> wrote: >>> >>>> On 2020/11/20 1:41, Alex Williamson wrote: >>>>> On Thu, 19 Nov 2020 14:13:24 +0530 >>>>> Kirti Wankhede <kwankh...@nvidia.com> wrote: >>>>> >>>>>> On 11/14/2020 2:47 PM, Shenming Lu wrote: >>>>>>> When running VFIO migration, I found that the restoring of VFIO PCI >>>>>>> device’s >>>>>>> config space is before VGIC on ARM64 target. But generally, interrupt >>>>>>> controllers >>>>>>> need to be restored before PCI devices. >>>>>> >>>>>> Is there any other way by which VGIC can be restored before PCI device? >>>> >>>> As far as I know, it seems to have to depend on priorities in the >>>> non-iterable process. >>>> >>>>>> >>>>>>> Besides, if a VFIO PCI device is >>>>>>> configured to have directly-injected MSIs (VLPIs), the restoring of its >>>>>>> config >>>>>>> space will trigger the configuring of these VLPIs (in kernel), where it >>>>>>> would >>>>>>> return an error as I saw due to the dependency on kvm’s vgic. >>>>>>> >>>>>> >>>>>> Can this be fixed in kernel to re-initialize the kernel state? >>>> >>>> Did you mean to reconfigure these VLPIs when restoring kvm's vgic? >>>> But the fact is that this error is not caused by kernel, it is due to the >>>> incorrect >>>> calling order of qemu... >>>> >>>>>> >>>>>>> To avoid this, we can move the saving of the config space from the >>>>>>> iterable >>>>>>> process to the non-iterable process, so that it will be called after >>>>>>> VGIC >>>>>>> according to their priorities. >>>>>>> >>>>>> >>>>>> With this change, at resume side, pre-copy phase data would reach >>>>>> destination without restored config space. VFIO device on destination >>>>>> might need it's config space setup and validated before it can accept >>>>>> further VFIO device specific migration state. >>>>>> >>>>>> This also changes bit-stream, so it would break migration with original >>>>>> migration patch-set. >>>>> >>>>> Config space can continue to change while in pre-copy, if we're only >>>>> sending config space at the initiation of pre-copy, how are any changes >>>>> that might occur before the VM is stopped conveyed to the target? For >>>>> example the guest might reboot and a device returned to INTx mode from >>>>> MSI during pre-copy. Thanks, >>>> >>>> What I see is that the config space is only saved once in >>>> save_live_complete_precopy >>>> currently... >>>> As you said, a VFIO device might need it's config space setup first, and >>>> the config space can continue to change while in pre-copy, Did you mean we >>>> have to migrate the config space in save_live_iterate? >>>> However, I still have a little doubt about the restoring dependence between >>>> the qemu emulated config space and the device data... >>>> >>>> Besides, if we surely can't move the saving of the config space back, can >>>> we >>>> just move some actions which are triggered by the restoring of the config >>>> space >>>> back (such as vfio_msix_enable())? >>> >>> It seems that the significant benefit to enabling interrupts during >>> pre-copy would be to reduce the latency and failure potential during >>> the final phase of migration. Do we have any data for how much it adds >>> to the device contributed downtime to configure interrupts only at the >>> final stage? My guess is that it's a measurable delay on its own. At >>> the same time, we can't ignore the differences in machine specific >>> dependencies and if we don't even sync the config space once the VM is >>> stopped... this all seems not ready to call supported, especially if we >>> have concerns already about migration bit-stream compatibility. >>> >> >> I have another question for this, if we restore the config space while in >> pre-copy >> (include enabling interrupts), does it affect the _RESUMING state (paused) >> of the >> device on the dst host (cause it to send interrupts? which should not be >> allowed >> in this stage). Does the restore sequence need to be further discussed and >> reach >> a consensus(spec) (taking into account other devices and the corresponding >> actions >> of the vendor driver)? >> >>> Given our timing relative to QEMU 5.2, the only path I feel comfortable >>> with is to move forward with downgrading vfio migration support to be >>> enabled via an experimental option. Objections? Thanks, >> >> Alright, but this issue is related to our ARM GICv4.1 migration scheme, >> could you >> give a rough idea about this (where to enable interrupts, we hope it to be >> after >> the restoring of VGIC)? > > I disagree. If this is only specific to Huawei ARM GIC implementation, why do > we want to > make the entire VFIO based migration an experimental feature?
It is not specific to Huawei ARM GIC implementation, the error was encountered in general ARM GIC implementation... Thanks, Shenming