On Tue, 21 Jun 2016 02:15:23 +0200 Laszlo Ersek <ler...@redhat.com> wrote:
> On 06/21/16 00:04, Alex Williamson wrote: > > The kernel currently exposes the SR-IOV capability as read-only > > through vfio-pci. This is sufficient to protect the host kernel, but > > has the potential to confuse guests without further virtualization. > > In particular, OVMF tries to size the VF BARs and comes up with absurd > > results, ending with an assert. There's not much point in adding > > virtualization to a read-only capability, so we simply hide it for > > now. If the kernel ever enables SR-IOV virtualization, we should > > easily be able to test it through VF BAR sizing or explicit flags. > > > > Testing whether we should parse extended capabilities is also pulled > > into the function to keep these assumptions in one place. > > > > Signed-off-by: Alex Williamson <alex.william...@redhat.com> > > --- > > > > This depends on Chen Fan's patch "vfio: add pcie extended capability > > support", which I'll pull from Zhou Jie's latest series unless there > > are comments to the contrary. Otherwise based on Stefan's tracing > > pull request so as not to conflict. > > > > hw/vfio/pci.c | 49 > > +++++++++++++++++++++++++++++++++++++++---------- > > hw/vfio/trace-events | 1 + > > 2 files changed, 40 insertions(+), 10 deletions(-) > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > > index a171056b..36d5e00 100644 > > --- a/hw/vfio/pci.c > > +++ b/hw/vfio/pci.c > > @@ -1772,6 +1772,12 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev) > > uint8_t cap_ver; > > uint8_t *config; > > > > + /* Only add extended caps if we have them and the guest can see them */ > > + if (!pci_is_express(pdev) || !pci_bus_is_express(pdev->bus) || > > + !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) { > > + return 0; > > + } > > + > > /* > > * pcie_add_capability always inserts the new capability at the tail > > * of the chain. Therefore to end up with a chain that matches the > > @@ -1780,6 +1786,25 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev) > > */ > > config = g_memdup(pdev->config, vdev->config_size); > > > > + /* > > + * Extended capabilities are chained with each pointing to the next, > > so we > > + * can drop anything other than the head of the chain simply by > > modifying > > + * the previous next pointer. For the head of the chain, we can > > modify the > > + * capability ID to something that cannot match a valid capability. ID > > + * 0 is reserved for this since absence of capabilities is indicated by > > + * 0 for the ID, version, AND next pointer. However, > > pcie_add_capability() > > + * uses ID 0 as reserved for list management and will incorrectly > > match and > > + * assert if we attempt to pre-load the head of the chain with with > > this > > + * ID. Use ID 0xFFFF temporarily since it is also seems to be > > reserved in > > + * part for identifying abscense of capabilities in a root complex > > register > > + * block. If the ID still exists after adding capabilities, switch > > back to > > + * zero. We'll mark this entire first dword as emulated for this > > purpose. > > + */ > > + pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE, > > + PCI_EXT_CAP(0xFFFF, 0, 0)); > > + pci_set_long(pdev->wmask + PCI_CONFIG_SPACE_SIZE, 0); > > + pci_set_long(vdev->emulated_config_bits + PCI_CONFIG_SPACE_SIZE, ~0); > > + > > for (next = PCI_CONFIG_SPACE_SIZE; next; > > next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) { > > header = pci_get_long(config + next); > > @@ -1794,12 +1819,23 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev) > > */ > > size = vfio_ext_cap_max_size(config, next); > > > > - pcie_add_capability(pdev, cap_id, cap_ver, next, size); > > - pci_set_long(pdev->config + next, PCI_EXT_CAP(cap_id, cap_ver, 0)); > > - > > /* Use emulated next pointer to allow dropping extended caps */ > > pci_long_test_and_set_mask(vdev->emulated_config_bits + next, > > PCI_EXT_CAP_NEXT_MASK); > > + > > + switch (cap_id) { > > + case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuses OVMF */ > > I think s/confuses/confuse/. Thanks, fixed. > Other than that, this is mostly black magic to me, so I can't even ACK > it with a straight face :) > > I would like to test it, and report back, but then again, I don't have a > NIC with virtual functions. :/ Time to justify one for OVMF testing? ;) Thanks, Alex