I'm looking at interrupt remapping (because I need to hook into the translation somehow to add PIRQ support for Xen which translates guest MSIs directly to KVM_IRQ_ROUTING_XEN_EVTCHN).
Am I right in understanding that it doesn't report faults on interrupts which can't be translated? It attempts to translate interrupts at the time the table is modified (vtd_int_remap()) or when an APIC access actually triggers an MSI (vtd_mem_ir_write()) but in neither case does it actually raise a fault? The behaviour we want here is that we only raise a fault when the IRQ actually *happens*. But that's hard in our current model where it looks like we pretranslate *everything* in advance and just let it run. Here's a proposal for a model which could make it work (using VFIO as the example since that's the more complex part but it works for emulated MSI sources too): We consume the VFIO eventfd *both* in userspace and the kernel. (Since https://lore.kernel.org/kvm/20201027143944.648769-1-dw...@infradead.org/ we can just keep listening on the VFIO eventfd in userspace and the kernel will eat all the events so you never notice. On older kernels we have to manually stop listening in userspace.) When a translation is valid and should be considered 'cached' in the IOMMU, that's when we actually hook it up to the irqfd. We can ditch the iec invalidate callbacks (vtd_iec_notify_all) because all an invalidation needs to do is KVM_IRQFD_FLAG_DEASSIGN for the corresponding GSI. ( You might consider abusing a spare field in the KVM routing table to hold a cookie like the IRTE# so that you know *which* entries to invalidate. I couldn't possibly comment. /* 64-bit cookie for IOMMU to use for invalidation choices */ #define ire_ir_cookie(ire) ((ire)->u.adapter.ind_offset) /* Flags, to indicate a stale entry that needs retranslating */ #define ire_user_flags(ire) ((ire)->u.adapter.summary_offset) #define IRE_USER_FLAG_STALE 1 ) So when an interrupt happens and it's *untranslated*, that's when it gets raised to userspace to handle, e.g. in vfio_msi_interrupt(). That does the normal thing and attempts to deliver the guest MSI directly. We add a flag "bool delivering_now" to the X86IOMMUClass int_remap function, to allow it to distinguish between preemptive translations and actual delivery and to raise the fault in the latter case. When the guest frobs a device's MSI table we can do the translation as we do at the moment, of course with the 'delivering_now' argument being false. And *if* the translation succeeds then we can install the IRQFD right away. This model allows us to generate faults as the hardware would, and also improves the efficiency of invalidation by only invalidating what we need to. I haven't looked hard at how it works with an emulated AMD IOMMU, but I know that the Xen PIRQ support (which is where I came in) slots into it fairly trivially, using the PIRQ# as the 'cookie' for invalidation instead of the IRTE# that the Intel IOMMU uses.
smime.p7s
Description: S/MIME cryptographic signature