RE: [PATCH v6 15/19] vfio-user: handle device interrupts

Thanos Makatos Wed, 30 Mar 2022 02:41:17 -0700

> -----Original Message-----
> From: Jag Raman <jag.ra...@oracle.com>
> Sent: 29 March 2022 20:07
> To: Stefan Hajnoczi <stefa...@redhat.com>
> Cc: Alex Williamson <alex.william...@redhat.com>; qemu-devel <qemu-
> de...@nongnu.org>; Michael S. Tsirkin <m...@redhat.com>; Philippe Mathieu-
> Daudé <f4...@amsat.org>; Paolo Bonzini <pbonz...@redhat.com>; Beraldo
> Leal <bl...@redhat.com>; Daniel P. Berrangé <berra...@redhat.com>;
> edua...@habkost.net; Marcel Apfelbaum <marcel.apfelb...@gmail.com>;
> Eric Blake <ebl...@redhat.com>; Markus Armbruster <arm...@redhat.com>;
> Juan Quintela <quint...@redhat.com>; dgilb...@redhat.com; John Levon
> <john.le...@nutanix.com>; Thanos Makatos <thanos.maka...@nutanix.com>;
> Elena Ufimtseva <elena.ufimts...@oracle.com>; John Johnson
> <john.g.john...@oracle.com>; Kanth Ghatraju <kanth.ghatr...@oracle.com>
> Subject: Re: [PATCH v6 15/19] vfio-user: handle device interrupts
> 
> 
> 
> > On Mar 29, 2022, at 10:24 AM, Stefan Hajnoczi <stefa...@redhat.com>
> wrote:
> >
> > On Sat, Mar 26, 2022 at 11:47:36PM +0000, Jag Raman wrote:
> >>
> >>
> >>> On Mar 7, 2022, at 5:24 AM, Stefan Hajnoczi <stefa...@redhat.com>
> wrote:
> >>>
> >>> On Thu, Feb 17, 2022 at 02:49:02AM -0500, Jagannathan Raman wrote:
> >>>> Forward remote device's interrupts to the guest
> >>>>
> >>>> Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
> >>>> Signed-off-by: John G Johnson <john.g.john...@oracle.com>
> >>>> Signed-off-by: Jagannathan Raman <jag.ra...@oracle.com>
> >>>> ---
> >>>> include/hw/pci/pci.h              |   6 ++
> >>>> include/hw/remote/vfio-user-obj.h |   6 ++
> >>>> hw/pci/msi.c                      |  13 +++-
> >>>> hw/pci/msix.c                     |  12 +++-
> >>>> hw/remote/machine.c               |  11 +--
> >>>> hw/remote/vfio-user-obj.c         | 107 ++++++++++++++++++++++++++++++
> >>>> stubs/vfio-user-obj.c             |   6 ++
> >>>> MAINTAINERS                       |   1 +
> >>>> hw/remote/trace-events            |   1 +
> >>>> stubs/meson.build                 |   1 +
> >>>> 10 files changed, 158 insertions(+), 6 deletions(-)
> >>>> create mode 100644 include/hw/remote/vfio-user-obj.h
> >>>> create mode 100644 stubs/vfio-user-obj.c
> >>>>
> >>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >>>> index c3f3c90473..d42d526a48 100644
> >>>> --- a/include/hw/pci/pci.h
> >>>> +++ b/include/hw/pci/pci.h
> >>>> @@ -129,6 +129,8 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice
> *pci_dev,
> >>>> typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
> >>>>                                pcibus_t addr, pcibus_t size, int type);
> >>>> typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
> >>>> +typedef void PCIMSINotify(PCIDevice *pci_dev, unsigned vector);
> >>>> +typedef void PCIMSIxNotify(PCIDevice *pci_dev, unsigned vector);
> >>>>
> >>>> typedef struct PCIIORegion {
> >>>>    pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
> >>>> @@ -323,6 +325,10 @@ struct PCIDevice {
> >>>>    /* Space to store MSIX table & pending bit array */
> >>>>    uint8_t *msix_table;
> >>>>    uint8_t *msix_pba;
> >>>> +
> >>>> +    PCIMSINotify *msi_notify;
> >>>> +    PCIMSIxNotify *msix_notify;
> >>>> +
> >>>>    /* MemoryRegion container for msix exclusive BAR setup */
> >>>>    MemoryRegion msix_exclusive_bar;
> >>>>    /* Memory Regions for MSIX table and pending bit entries. */
> >>>> diff --git a/include/hw/remote/vfio-user-obj.h b/include/hw/remote/vfio-
> user-obj.h
> >>>> new file mode 100644
> >>>> index 0000000000..87ab78b875
> >>>> --- /dev/null
> >>>> +++ b/include/hw/remote/vfio-user-obj.h
> >>>> @@ -0,0 +1,6 @@
> >>>> +#ifndef VFIO_USER_OBJ_H
> >>>> +#define VFIO_USER_OBJ_H
> >>>> +
> >>>> +void vfu_object_set_bus_irq(PCIBus *pci_bus);
> >>>> +
> >>>> +#endif
> >>>> diff --git a/hw/pci/msi.c b/hw/pci/msi.c
> >>>> index 47d2b0f33c..93f5e400cc 100644
> >>>> --- a/hw/pci/msi.c
> >>>> +++ b/hw/pci/msi.c
> >>>> @@ -51,6 +51,8 @@
> >>>> */
> >>>> bool msi_nonbroken;
> >>>>
> >>>> +static void pci_msi_notify(PCIDevice *dev, unsigned int vector);
> >>>> +
> >>>> /* If we get rid of cap allocator, we won't need this. */
> >>>> static inline uint8_t msi_cap_sizeof(uint16_t flags)
> >>>> {
> >>>> @@ -225,6 +227,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
> >>>>    dev->msi_cap = config_offset;
> >>>>    dev->cap_present |= QEMU_PCI_CAP_MSI;
> >>>>
> >>>> +    dev->msi_notify = pci_msi_notify;
> >>>
> >>> Are you sure it's correct to skip the msi_is_masked() logic? I think the
> >>> callback function should only override the behavior of
> >>> msi_send_message(), not the entire msi_notify() function.
> >>>
> >>> The same applies to MSI-X.
> >>
> >> Hi Stefan,
> >>
> >> We noticed that the client is handling the masking and unmasking of MSIx
> >> interrupts.
> >>
> >> Concerning MSIx, vfio_msix_vector_use() handles unmasking and
> >> vfio_msix_vector_release() handles masking operations. The server triggers
> >> an MSIx interrupt by signaling the eventfd associated with the vector. If 
> >> the
> vector
> >> is unmasked, the interrupt bypasses the client/QEMU and takes this
> >> path: “server -> KVM -> guest”. Whereas, if the vector is masked, it lands 
> >> on
> the
> >> client via: “server -> vfio_msi_interrupt()”. vfio_msi_interrupt() 
> >> suppresses
> the
> >> interrupt if the vector is masked. The use and release functions switch the
> server’s
> >> eventfd between VFIOPCIDevice->VFIOMSIVector[i]->kvm_interrupt and
> >> VFIOPCIDevice->VFIOMSIVector[i]->interrupt using the
> >> VFIO_DEVICE_SET_IRQS message.
> >>
> >> Concerning MSI, the server should check if the vector is unmasked before
> >> triggering. The server is not doing it presently, will update it. For some
> reason,
> >> I had assumed that MSI handling is similar to MSIx in terms of masking - 
> >> sorry
> >> about that. The masking and unmasking information for MSI is in the config
> space
> >> registers, so the server should have this information.
> >>
> >> You had previously suggested using callbacks for msi_get_message &
> >> msi_send_message, considering the masking issue. Given MSIx masking
> >> (including the MSIx table BAR) is handled at the client, the masking
> information
> >> doesn’t reach the server - so msix_notify will never invoke the
> >> msi_send_message callback - all the vectors remain masked at the server
> >> end (msix_init() -> msix_mask_all()).
> >
> > I was expecting vfio-user devices to be involved in MSI-X masking so


libvfio-user can't be involved in the first place since QEMU emulates MSI/X:
https://lore.kernel.org/all/20200121101911.64701...@w520.home/T/

> > they can implement the Pending bit semantics described in the spec:
> >
> >  If a masked vector has its Pending bit Set, and the associated
> >  underlying interrupt events are somehow satisfied (usually by software
> >  though the exact manner is Function-specific), the Function must Clear
> >  the Pending bit, to avoid sending a spurious interrupt message later
> >  when software unmasks the vector.
> >
> > Does QEMU VFIO support this?
> 
> QEMU VFIO doesn’t seem to support it - I couldn’t find a place where
> an assigned/passthru PCI device clears the pending bits in QEMU.
> 
> >
> > What happens when a hw/net/e1000e_core.c vfio-user device uses
> > msix_clr_pending() and related APIs?
> >
> > Also, having the vfio-user daemon write to the eventfd while the vector
> > is masked is a waste of CPU cycles. The PCIe spec describes using MSI-X
> > masking for poll mode operation and going via eventfd is suboptimal:
> >
> >  Software is permitted to mask one or more vectors indefinitely, and
> >  service their associated interrupt events strictly based on polling
> >  their Pending bits. A Function must Set and Clear its Pending bits as
> >  necessary to support this “pure polling” mode of operation.
> >
> > Maybe the answer is these issues don't matter in practice because MSI-X
> > masking is not used much?
> 
> From what I can tell, “pure polling” is used by ivshmem and virtio-pci 
> devices in
> QEMU.
> 
> e1000e doesn’t use “pure polling”, but it does clear pending interrupts.
> 
> John Johnson, John Levon & Thanos,
> 
>     Any thoughts?

If QEMU stops emulating MSI/X then we libvfio-user would have to do it.

RE: [PATCH v6 15/19] vfio-user: handle device interrupts

Reply via email to