On Jun 1, 2022, at 2:30 PM, Alex Williamson <alex.william...@redhat.com<mailto:alex.william...@redhat.com>> wrote:
On Wed, 1 Jun 2022 18:01:39 +0000 Jag Raman <jag.ra...@oracle.com<mailto:jag.ra...@oracle.com>> wrote: On Jun 1, 2022, at 1:26 PM, Alex Williamson <alex.william...@redhat.com<mailto:alex.william...@redhat.com>> wrote: On Wed, 1 Jun 2022 17:00:54 +0000 Jag Raman <jag.ra...@oracle.com<mailto:jag.ra...@oracle.com>> wrote: Hi Alex, Just to add some more detail, the emulated PCI device in QEMU presently maintains a MSIx table (PCIDevice->msix_table) and Pending Bit Array. In the present VFIO PCI device implementation, QEMU leverages the same MSIx table for interrupt masking/unmasking. The backend PCI device (such as the passthru device) always thinks that the interrupt is unmasked and lets QEMU manage masking. Whereas in the vfio-user case, the client additionally pushes a copy of emulated PCI device’s table downstream to the remote device. We did this to allow a small set of devices (such as e1000e) to clear the PBA (msix_clr_pending()). Secondly, the remote device uses its copy of the MSIx table to determine if interrupt should be triggered - this would prevent an interrupt from being sent to the client unnecessarily if it's masked. We are wondering if pushing the MSIx table to the remote device and reading PBA from it would diverge from the VFIO protocol specification? From your comment, I understand it’s similar to VFIO protocol because VFIO clients could mask an interrupt using VFIO_DEVICE_SET_IRQS ioctl + VFIO_IRQ_SET_ACTION_MASK / _UNMASK flags. I observed that QEMU presently does not use this approach and the kernel does not support it for MSI. I believe the SET_IRQS ioctl definition is pre-enabled to support masking and unmasking, we've just lacked kernel support to mask at the device which leads to the hybrid approach we have today. Our intention would be to use the current uAPI, to provide that masking support, at which point we'd leave the PBA mapped to the device. Thank you for clarifying! So whether your proposal diverges from the VFIO uAPI depends on what you mean by "pushing the MSIx table to the remote device". If that's done by implementing the existing SET_IRQS masking support, then you're spot on. OTOH, if you're actually pushing a copy of the MSIx table from the client, that's certainly not how I had envisioned the kernel In the current implementation - when the guest accesses the MSIx table and PBA, the client passes these accesses through to the remote device. I suppose you can do this because you don't need some means for the client to register a notification mechanism for the interrupt, you can already send notifications via the socket. But this is now a divergence from the kernel vfio uapi and eliminates what is intended to be a device agnostic interrupt interface. If we switch to using SET_IRQS approach, we could use SET_IRQS message for masking/unmasking, but still pass through the the PBA access to the backend PCI device. Yes. So I think the question is, if we should switch vfio-user to SET_IRQS now for masking or unmasking, or whenever QEMU does it in the future? The PBA access would remain the same as it’s now - via device BAR. I apologize that I'm constantly overwhelmed with requests that I haven't reviewed it, but it seems like the client side would be far more compliant to the vfio kernel interface if vfio-user did implement an interpretation of the SET_IRQS ioctl. Potentially couldn't you also Thanks for confirming! We’ll explore using SET_IRQS for masking and unmasking. make use of eventfds or define other data types to pass that would give the client more flexibility in receiving interrupts? Thanks, I think the libvfio-user library already uses eventfds to pass interrupts to the client. -- Jag Alex