Re: [Qemu-devel] [PATCH] vfio-pci: Make host MSI-X enable track guest

Alex Williamson Fri, 21 Dec 2012 07:38:34 -0800

On Fri, 2012-12-21 at 14:21 +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 20, 2012 at 03:12:46PM -0700, Alex Williamson wrote:
> > On Thu, 2012-12-20 at 18:36 +0200, Michael S. Tsirkin wrote:
> > > On Thu, Dec 20, 2012 at 09:06:41AM -0700, Alex Williamson wrote:
> > > > Guests typically enable MSI-X with all of the vectors in the MSI-X
> > > > vector table masked.  Only when the vector is enabled does the vector
> > > > get unmasked, resulting in a vector_use callback.  These two points,
> > > > enable and unmask, correspond to pci_enable_msix() and request_irq()
> > > > for Linux guests.  Some drivers rely on VF/PF or PF/fw communication
> > > > channels that expect the physical state of the device to match the
> > > > guest visible state of the device.  They don't appreciate lazily
> > > > enabling MSI-X on the physical device.
> > > > 
> > > > To solve this, enable MSI-X with a single vector when the MSI-X
> > > > capability is enabled and immediate disable the vector.  This leaves
> > > > the physical device in exactly the same state between host and guest.
> > > > Furthermore, the brief gap where we enable vector 0, it fires into
> > > > userspace, not KVM, so the guest doesn't get spurious interrupts.
> > > > Ideally we could call VFIO_DEVICE_SET_IRQS with the right parameters
> > > > to enable MSI-X with zero vectors, but this will currently return an
> > > > error as the Linux MSI-X interfaces do not allow it.
> > > > 
> > > > Cc: qemu-sta...@nongnu.org
> > > > Signed-off-by: Alex Williamson <alex.william...@redhat.com>
> > > 
> > > Do you need an interface for this?  Can you do low-level pci config
> > > access instead?  I imagine you would then enable MSIX and mask all
> > > vectors at the same time.
> > > 
> > > No?
> > 
> > I really don't like the idea of enabling MSI-X directly through config
> > space.  We're just asking for ownership conflicts doing that.  In fact,
> > vfio prevents MSI/X from being enabled through config space since those
> > are controlled through ioctl.
> 
> For vfio the natural thing to do would be to add interfaces
> to do this in a controlled manner.


The ioctl is the controlled manner.  The ioctl will actually accept
enabling zero vectors, but you'll get an -ERANGE generated from
pci_enable_msix, so I think the interface there is fine.

> >  It also prevents access to the MSI-X
> > vector table since userspace has no business reading or modifying it.
> > Thanks,
> > 
> > Alex
> 
> I'm not sure what the point of this is. If a device can do DMA writes
> there is no way to distinguish between them and MSI on the bus.
> So this seems to buy us no additional security.

IOMMUs supporting interrupt remapping can.  Do you propose we have one
interface when interrupt remapping is present and another when it's not?
I can't think of anything useful that userspace can do with direct
access to MSIX setup, including this MSI-X enable stub.

Do you have any actual objections to the patch below?  I agree that
kernel interfaces can be improved and I'd prefer the ability to enable
MSI-X with zero vectors and incrementally add and remove vectors, but
creating side channels so userspace can poke around here is not an
acceptable alternative imho.  We need a solution for users hitting this
now and I'm pretty happy with how this solution works.  I only wish we
could make legacy assignment behave as cleanly, but I'm not willing to
poke MSI-X enable bits myself to make it happen.  Thanks,

Alex

> > > > ---
> > > >  hw/vfio_pci.c |   31 +++++++++++++++++++++++++++----
> > > >  1 file changed, 27 insertions(+), 4 deletions(-)
> > > > 
> > > > VFIO makes this a bit cleaner, so I think this is both the stable and
> > > > final fix here.
> > > > 
> > > > diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
> > > > index 7c27834..5178ccc 100644
> > > > --- a/hw/vfio_pci.c
> > > > +++ b/hw/vfio_pci.c
> > > > @@ -561,8 +561,9 @@ static int vfio_enable_vectors(VFIODevice *vdev, 
> > > > bool msix)
> > > >      return ret;
> > > >  }
> > > >  
> > > > -static int vfio_msix_vector_use(PCIDevice *pdev,
> > > > -                                unsigned int nr, MSIMessage msg)
> > > > +static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
> > > > +                                   MSIMessage msg, bool try_kvm,
> > > > +                                   IOHandler *handler)
> > > >  {
> > > >      VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
> > > >      VFIOMSIVector *vector;
> > > > @@ -586,7 +587,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
> > > >       * Attempt to enable route through KVM irqchip,
> > > >       * default to userspace handling if unavailable.
> > > >       */
> > > > -    vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
> > > > +    vector->virq = try_kvm ? kvm_irqchip_add_msi_route(kvm_state, msg) 
> > > > : -1;
> > > >      if (vector->virq < 0 ||
> > > >          kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
> > > >                                         vector->virq) < 0) {
> > > > @@ -595,7 +596,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
> > > >              vector->virq = -1;
> > > >          }
> > > >          qemu_set_fd_handler(event_notifier_get_fd(&vector->interrupt),
> > > > -                            vfio_msi_interrupt, NULL, vector);
> > > > +                            handler, NULL, vector);
> > > >      }
> > > >  
> > > >      /*
> > > > @@ -638,6 +639,12 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
> > > >      return 0;
> > > >  }
> > > >  
> > > > +static int vfio_msix_vector_use(PCIDevice *pdev,
> > > > +                                unsigned int nr, MSIMessage msg)
> > > > +{
> > > > +    return vfio_msix_vector_do_use(pdev, nr, msg, true, 
> > > > vfio_msi_interrupt);
> > > > +}
> > > > +
> > > >  static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
> > > >  {
> > > >      VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
> > > > @@ -696,6 +703,22 @@ static void vfio_enable_msix(VFIODevice *vdev)
> > > >  
> > > >      vdev->interrupt = VFIO_INT_MSIX;
> > > >  
> > > > +    /*
> > > > +     * Some communication channels between VF & PF or PF & fw rely on 
> > > > the
> > > > +     * physical state of the device and expect that enabling MSI-X 
> > > > from the
> > > > +     * guest enables the same on the host.  When our guest is Linux, 
> > > > the
> > > > +     * guest driver call to pci_enable_msix() sets the enabling bit in 
> > > > the
> > > > +     * MSI-X capability, but leaves the vector table masked.  We 
> > > > therefore
> > > > +     * can't rely on a vector_use callback (from request_irq() in the 
> > > > guest)
> > > > +     * to switch the physical device into MSI-X mode because that may 
> > > > come a
> > > > +     * long time after pci_enable_msix().  This code enables vector 0 
> > > > with
> > > > +     * triggering to userspace, then immediately release the vector, 
> > > > leaving
> > > > +     * the physical device with no vectors enabled, but MSI-X enabled, 
> > > > just
> > > > +     * like the guest view.
> > > > +     */
> > > > +    vfio_msix_vector_do_use(&vdev->pdev, 0, (MSIMessage) { 0, 0 }, 
> > > > false, NULL);
> > > > +    vfio_msix_vector_release(&vdev->pdev, 0);
> > > > +
> > > >      if (msix_set_vector_notifiers(&vdev->pdev, vfio_msix_vector_use,
> > > >                                    vfio_msix_vector_release)) {
> > > >          error_report("vfio: msix_set_vector_notifiers failed\n");
> > 
> >

Re: [Qemu-devel] [PATCH] vfio-pci: Make host MSI-X enable track guest

Reply via email to