On Fri, 2012-12-21 at 14:21 +0200, Michael S. Tsirkin wrote: > On Thu, Dec 20, 2012 at 03:12:46PM -0700, Alex Williamson wrote: > > On Thu, 2012-12-20 at 18:36 +0200, Michael S. Tsirkin wrote: > > > On Thu, Dec 20, 2012 at 09:06:41AM -0700, Alex Williamson wrote: > > > > Guests typically enable MSI-X with all of the vectors in the MSI-X > > > > vector table masked. Only when the vector is enabled does the vector > > > > get unmasked, resulting in a vector_use callback. These two points, > > > > enable and unmask, correspond to pci_enable_msix() and request_irq() > > > > for Linux guests. Some drivers rely on VF/PF or PF/fw communication > > > > channels that expect the physical state of the device to match the > > > > guest visible state of the device. They don't appreciate lazily > > > > enabling MSI-X on the physical device. > > > > > > > > To solve this, enable MSI-X with a single vector when the MSI-X > > > > capability is enabled and immediate disable the vector. This leaves > > > > the physical device in exactly the same state between host and guest. > > > > Furthermore, the brief gap where we enable vector 0, it fires into > > > > userspace, not KVM, so the guest doesn't get spurious interrupts. > > > > Ideally we could call VFIO_DEVICE_SET_IRQS with the right parameters > > > > to enable MSI-X with zero vectors, but this will currently return an > > > > error as the Linux MSI-X interfaces do not allow it. > > > > > > > > Cc: qemu-sta...@nongnu.org > > > > Signed-off-by: Alex Williamson <alex.william...@redhat.com> > > > > > > Do you need an interface for this? Can you do low-level pci config > > > access instead? I imagine you would then enable MSIX and mask all > > > vectors at the same time. > > > > > > No? > > > > I really don't like the idea of enabling MSI-X directly through config > > space. We're just asking for ownership conflicts doing that. In fact, > > vfio prevents MSI/X from being enabled through config space since those > > are controlled through ioctl. > > For vfio the natural thing to do would be to add interfaces > to do this in a controlled manner.
The ioctl is the controlled manner. The ioctl will actually accept enabling zero vectors, but you'll get an -ERANGE generated from pci_enable_msix, so I think the interface there is fine. > > It also prevents access to the MSI-X > > vector table since userspace has no business reading or modifying it. > > Thanks, > > > > Alex > > I'm not sure what the point of this is. If a device can do DMA writes > there is no way to distinguish between them and MSI on the bus. > So this seems to buy us no additional security. IOMMUs supporting interrupt remapping can. Do you propose we have one interface when interrupt remapping is present and another when it's not? I can't think of anything useful that userspace can do with direct access to MSIX setup, including this MSI-X enable stub. Do you have any actual objections to the patch below? I agree that kernel interfaces can be improved and I'd prefer the ability to enable MSI-X with zero vectors and incrementally add and remove vectors, but creating side channels so userspace can poke around here is not an acceptable alternative imho. We need a solution for users hitting this now and I'm pretty happy with how this solution works. I only wish we could make legacy assignment behave as cleanly, but I'm not willing to poke MSI-X enable bits myself to make it happen. Thanks, Alex > > > > --- > > > > hw/vfio_pci.c | 31 +++++++++++++++++++++++++++---- > > > > 1 file changed, 27 insertions(+), 4 deletions(-) > > > > > > > > VFIO makes this a bit cleaner, so I think this is both the stable and > > > > final fix here. > > > > > > > > diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c > > > > index 7c27834..5178ccc 100644 > > > > --- a/hw/vfio_pci.c > > > > +++ b/hw/vfio_pci.c > > > > @@ -561,8 +561,9 @@ static int vfio_enable_vectors(VFIODevice *vdev, > > > > bool msix) > > > > return ret; > > > > } > > > > > > > > -static int vfio_msix_vector_use(PCIDevice *pdev, > > > > - unsigned int nr, MSIMessage msg) > > > > +static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr, > > > > + MSIMessage msg, bool try_kvm, > > > > + IOHandler *handler) > > > > { > > > > VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev); > > > > VFIOMSIVector *vector; > > > > @@ -586,7 +587,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev, > > > > * Attempt to enable route through KVM irqchip, > > > > * default to userspace handling if unavailable. > > > > */ > > > > - vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg); > > > > + vector->virq = try_kvm ? kvm_irqchip_add_msi_route(kvm_state, msg) > > > > : -1; > > > > if (vector->virq < 0 || > > > > kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt, > > > > vector->virq) < 0) { > > > > @@ -595,7 +596,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev, > > > > vector->virq = -1; > > > > } > > > > qemu_set_fd_handler(event_notifier_get_fd(&vector->interrupt), > > > > - vfio_msi_interrupt, NULL, vector); > > > > + handler, NULL, vector); > > > > } > > > > > > > > /* > > > > @@ -638,6 +639,12 @@ static int vfio_msix_vector_use(PCIDevice *pdev, > > > > return 0; > > > > } > > > > > > > > +static int vfio_msix_vector_use(PCIDevice *pdev, > > > > + unsigned int nr, MSIMessage msg) > > > > +{ > > > > + return vfio_msix_vector_do_use(pdev, nr, msg, true, > > > > vfio_msi_interrupt); > > > > +} > > > > + > > > > static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr) > > > > { > > > > VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev); > > > > @@ -696,6 +703,22 @@ static void vfio_enable_msix(VFIODevice *vdev) > > > > > > > > vdev->interrupt = VFIO_INT_MSIX; > > > > > > > > + /* > > > > + * Some communication channels between VF & PF or PF & fw rely on > > > > the > > > > + * physical state of the device and expect that enabling MSI-X > > > > from the > > > > + * guest enables the same on the host. When our guest is Linux, > > > > the > > > > + * guest driver call to pci_enable_msix() sets the enabling bit in > > > > the > > > > + * MSI-X capability, but leaves the vector table masked. We > > > > therefore > > > > + * can't rely on a vector_use callback (from request_irq() in the > > > > guest) > > > > + * to switch the physical device into MSI-X mode because that may > > > > come a > > > > + * long time after pci_enable_msix(). This code enables vector 0 > > > > with > > > > + * triggering to userspace, then immediately release the vector, > > > > leaving > > > > + * the physical device with no vectors enabled, but MSI-X enabled, > > > > just > > > > + * like the guest view. > > > > + */ > > > > + vfio_msix_vector_do_use(&vdev->pdev, 0, (MSIMessage) { 0, 0 }, > > > > false, NULL); > > > > + vfio_msix_vector_release(&vdev->pdev, 0); > > > > + > > > > if (msix_set_vector_notifiers(&vdev->pdev, vfio_msix_vector_use, > > > > vfio_msix_vector_release)) { > > > > error_report("vfio: msix_set_vector_notifiers failed\n"); > > > >