PCI devices prior to PCI 2.3 both use level interrupts and do not support
interrupt masking, leading to a failure when passed through to a KVM guest on
at least the ppc64 platform. This failure manifests as receiving and
acknowledging a single interrupt in the guest, while the device continues to
assert the level interrupt indicating a need for further servicing.

When lazy IRQ masking is used on DisINTx- (non-PCI 2.3) hardware, the following
sequence occurs:

 * Level IRQ assertion on device
 * IRQ marked disabled in kernel
 * Host interrupt handler exits without clearing the interrupt on the device
 * Eventfd is delivered to userspace
 * Host interrupt controller sees still-active INTX, reasserts IRQ
 * Host kernel ignores disabled IRQ
 * Guest processes IRQ and clears device interrupt
 * Software mask removed by VFIO driver

The behavior is now platform-dependent.  Some platforms (amd64) will continue
to spew IRQs for as long as the INTX line remains asserted, therefore the IRQ
will be handled by the host as soon as the mask is dropped.  Others (ppc64) will
only send the one request, and if it is not handled no further interrupts will
be sent.  The former behavior theoretically leaves the system vulnerable to
interrupt storm, and the latter will result in the device stalling after
receiving exactly one interrupt in the guest.

Work around this by disabling lazy IRQ masking for DisINTx- INTx devices.
---
 drivers/vfio/pci/vfio_pci_intrs.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index 123298a4dc8f..d8637b53d051 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -304,6 +304,9 @@ static int vfio_intx_enable(struct vfio_pci_core_device 
*vdev,
 
        vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
 
+       if (!vdev->pci_2_3)
+               irq_set_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY);
+
        ret = request_irq(pdev->irq, vfio_intx_handler,
                          irqflags, ctx->name, ctx);
        if (ret) {
@@ -352,6 +355,8 @@ static void vfio_intx_disable(struct vfio_pci_core_device 
*vdev)
                vfio_virqfd_disable(&ctx->unmask);
                vfio_virqfd_disable(&ctx->mask);
                free_irq(pdev->irq, ctx);
+               if (!vdev->pci_2_3)
+                       irq_clear_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY);
                if (ctx->trigger)
                        eventfd_ctx_put(ctx->trigger);
                kfree(ctx->name);
-- 
2.39.5

Reply via email to