On Wed, 1 Jun 2022 18:01:39 +0000 Jag Raman <jag.ra...@oracle.com> wrote:
> > On Jun 1, 2022, at 1:26 PM, Alex Williamson <alex.william...@redhat.com> > > wrote: > > > > On Wed, 1 Jun 2022 17:00:54 +0000 > > Jag Raman <jag.ra...@oracle.com> wrote: > >> > >> Hi Alex, > >> > >> Just to add some more detail, the emulated PCI device in QEMU presently > >> maintains a MSIx table (PCIDevice->msix_table) and Pending Bit Array. In > >> the > >> present VFIO PCI device implementation, QEMU leverages the same > >> MSIx table for interrupt masking/unmasking. The backend PCI device (such as > >> the passthru device) always thinks that the interrupt is unmasked and lets > >> QEMU manage masking. > >> > >> Whereas in the vfio-user case, the client additionally pushes a copy of > >> emulated PCI device’s table downstream to the remote device. We did this > >> to allow a small set of devices (such as e1000e) to clear the > >> PBA (msix_clr_pending()). Secondly, the remote device uses its copy of the > >> MSIx table to determine if interrupt should be triggered - this would > >> prevent > >> an interrupt from being sent to the client unnecessarily if it's masked. > >> > >> We are wondering if pushing the MSIx table to the remote device and > >> reading PBA from it would diverge from the VFIO protocol specification? > >> > >> From your comment, I understand it’s similar to VFIO protocol because VFIO > >> clients could mask an interrupt using VFIO_DEVICE_SET_IRQS ioctl + > >> VFIO_IRQ_SET_ACTION_MASK / _UNMASK flags. I observed that QEMU presently > >> does not use this approach and the kernel does not support it for MSI. > > > > I believe the SET_IRQS ioctl definition is pre-enabled to support > > masking and unmasking, we've just lacked kernel support to mask at the > > device which leads to the hybrid approach we have today. Our intention > > would be to use the current uAPI, to provide that masking support, at > > which point we'd leave the PBA mapped to the device. > > Thank you for clarifying! > > > > > So whether your proposal diverges from the VFIO uAPI depends on what > > you mean by "pushing the MSIx table to the remote device". If that's > > done by implementing the existing SET_IRQS masking support, then you're > > spot on. OTOH, if you're actually pushing a copy of the MSIx table > > from the client, that's certainly not how I had envisioned the kernel > > In the current implementation - when the guest accesses the MSIx table and > PBA, the client passes these accesses through to the remote device. I suppose you can do this because you don't need some means for the client to register a notification mechanism for the interrupt, you can already send notifications via the socket. But this is now a divergence from the kernel vfio uapi and eliminates what is intended to be a device agnostic interrupt interface. > If we switch to using SET_IRQS approach, we could use SET_IRQS > message for masking/unmasking, but still pass through the the PBA > access to the backend PCI device. Yes. > So I think the question is, if we should switch vfio-user to SET_IRQS > now for masking or unmasking, or whenever QEMU does it in the future? > The PBA access would remain the same as it’s now - via device BAR. I apologize that I'm constantly overwhelmed with requests that I haven't reviewed it, but it seems like the client side would be far more compliant to the vfio kernel interface if vfio-user did implement an interpretation of the SET_IRQS ioctl. Potentially couldn't you also make use of eventfds or define other data types to pass that would give the client more flexibility in receiving interrupts? Thanks, Alex