Hi Alex, 在 2021/8/3 22:19, Alex Williamson 写道: > On Tue, 3 Aug 2021 16:43:07 +0800 > "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" > <longpe...@huawei.com> wrote: > >> Hi Alex, >> >> We found that the msix_load() will cost 40~50ms if the VF has 60+ interrupts, >> the following code cost too much for each interrupt: >> >> msix_load: >> for (vector = 0; vector < 60; vector++) >> msix_handle_mask_update >> vfio_msix_vector_do_use >> vfio_add_kvm_msi_virq >> kvm_irqchip_add_msi_route >> kvm_irqchip_commit_routes <-- cost 0.8ms each time >> >> In irq remapping mode, the VF interrupts are not routed through KVM irqchip > > I'm not sure what this means. Your MSIX interrupts are going through > QEMU anyway? Why? > >> in fact, so maybe we can reduce this cost by "x-no-kvm-msix=ture", right? >> Are there any risks if we do in this way ? > > You're obviously free to configure your device this way, but the > expectation is that any sort of startup latency is more than offset by > improved runtime latency through the KVM route. This option is usually > reserved for debugging, when we want to see all interaction with the > device in QEMU. > > If there's a case where we're not detecting that a KVM route is > ineffective, then we should figure out how to detect that and skip this > code, but again the expectation is that the KVM route is worthwhile. > > If this is specifically about kvm_irqchip_commit_routes(), maybe the > setup path needs a way to batch multiple routes and defer the commit, > if that's possible. Thanks, >
I've implemented two options and did some simple tests, both options can reduce the vfio_msix_enable/msix_load cost, especially in HW live migration case, it can significantly reduce the downtime if the VF has more than 60+ vectors. Option-1 is "defer the commit": add a kvm virq without immediate commit, the virq would be kept in kvm_state.irq_routes for the moment, and only commit once at last. But it's not easy to fallback to QEMU route if someone fails. Option-2 is "defer the setup": use QEMU route as default in migration resume phase, try to switch to KVM route when the vector first firing. The first draft of the code is as below, do you have any suggestion ? diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index e1ea1d8..5656f3a 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -347,6 +347,11 @@ static void vfio_msi_interrupt(void *opaque) get_msg = msix_get_message; notify = msix_notify; + if (vector->need_switch) { + vector->need_switch = false; + vfio_add_kvm_msix_virq(vdev, vector, nr); + } + /* A masked vector firing needs to use the PBA, enable it */ if (msix_is_masked(&vdev->pdev, nr)) { set_bit(nr, vdev->msix->pending); @@ -438,6 +443,26 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector, vector->virq = virq; } +static void +vfio_add_kvm_msix_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector, int nr) +{ + VFIOMSIVector *vector = &vdev->msi_vectors[nr]; + Error *err = NULL; + int fd; + + vfio_add_kvm_msi_virq(vdev, vector, nr, true); + if (vector->virq < 0) { + return; + } + + fd = event_notifier_get_fd(&vector->kvm_interrupt); + if (vfio_set_irq_signaling(&vdev->vbasedev, + VFIO_PCI_MSIX_IRQ_INDEX, nr, + VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) { + error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name); + } +} + static void vfio_remove_kvm_msi_virq(VFIOMSIVector *vector) { kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, &vector->kvm_interrupt, @@ -490,7 +515,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr, } } else { if (msg) { - vfio_add_kvm_msi_virq(vdev, vector, nr, true); + if (vdev->lazy_set_virq) { + vector->need_switch = true; + } else { + vfio_add_kvm_msi_virq(vdev, vector, nr, true); + } } } @@ -566,6 +595,16 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr) } } +static void vfio_msix_pre_enable(VFIOPCIDevice *vdev) +{ + vdev->lazy_set_virq = true; +} + +static void vfio_msix_post_enable(VFIOPCIDevice *vdev) +{ + vdev->lazy_set_virq = false; +} + static void vfio_msix_enable(VFIOPCIDevice *vdev) { PCIDevice *pdev = &vdev->pdev; @@ -2466,7 +2505,9 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f) if (msi_enabled(pdev)) { vfio_msi_enable(vdev); } else if (msix_enabled(pdev)) { + vfio_msix_pre_enable(vdev); vfio_msix_enable(vdev); + vfio_msix_post_enable(vdev); } return ret; diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 6477751..31390f2 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -95,6 +95,7 @@ typedef struct VFIOMSIVector { struct VFIOPCIDevice *vdev; /* back pointer to device */ int virq; bool use; + bool need_switch; /* try to switch to KVM interrupt ? */ } VFIOMSIVector; enum { @@ -171,6 +172,7 @@ struct VFIOPCIDevice { bool no_kvm_ioeventfd; bool no_vfio_ioeventfd; bool enable_ramfb; + bool lazy_set_virq; VFIODisplay *dpy; Notifier irqchip_change_notifier; }; > Alex > > . >