On Thu, 16 Feb 2017 10:28:39 +0800 Peter Xu <pet...@redhat.com> wrote:
> On Wed, Feb 15, 2017 at 11:15:52AM -0700, Alex Williamson wrote: > > [...] > > > > Alex, do you like something like below to fix above issue that Jintack > > > has encountered? > > > > > > (note: this code is not for compile, only trying show what I mean...) > > > > > > ------8<------- > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > > > index 332f41d..4dca631 100644 > > > --- a/hw/vfio/pci.c > > > +++ b/hw/vfio/pci.c > > > @@ -1877,25 +1877,6 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev) > > > */ > > > config = g_memdup(pdev->config, vdev->config_size); > > > > > > - /* > > > - * Extended capabilities are chained with each pointing to the next, > > > so we > > > - * can drop anything other than the head of the chain simply by > > > modifying > > > - * the previous next pointer. For the head of the chain, we can > > > modify the > > > - * capability ID to something that cannot match a valid capability. > > > ID > > > - * 0 is reserved for this since absence of capabilities is indicated > > > by > > > - * 0 for the ID, version, AND next pointer. However, > > > pcie_add_capability() > > > - * uses ID 0 as reserved for list management and will incorrectly > > > match and > > > - * assert if we attempt to pre-load the head of the chain with this > > > ID. > > > - * Use ID 0xFFFF temporarily since it is also seems to be reserved in > > > - * part for identifying absence of capabilities in a root complex > > > register > > > - * block. If the ID still exists after adding capabilities, switch > > > back to > > > - * zero. We'll mark this entire first dword as emulated for this > > > purpose. > > > - */ > > > - pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE, > > > - PCI_EXT_CAP(0xFFFF, 0, 0)); > > > - pci_set_long(pdev->wmask + PCI_CONFIG_SPACE_SIZE, 0); > > > - pci_set_long(vdev->emulated_config_bits + PCI_CONFIG_SPACE_SIZE, ~0); > > > - > > > for (next = PCI_CONFIG_SPACE_SIZE; next; > > > next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) { > > > header = pci_get_long(config + next); > > > @@ -1917,6 +1898,8 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev) > > > switch (cap_id) { > > > case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */ > > > case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function > > > virtualization */ > > > + /* keep this ecap header (4 bytes), but mask cap_id to > > > 0xffff */ > > > + ... > > > trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, > > > next); > > > break; > > > default: > > > @@ -1925,11 +1908,6 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev) > > > > > > } > > > > > > - /* Cleanup chain head ID if necessary */ > > > - if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) { > > > - pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0); > > > - } > > > - > > > g_free(config); > > > return; > > > } > > > ----->8----- > > > > > > Since after all we need the assumption that 0xffff is reserved for > > > cap_id. Then, we can just remove the "first 0xffff then 0x0" hack, > > > which is imho error-prone and hacky. > > > > This doesn't fix the bug, which is that pcie_add_capability() uses a > > valid capability ID for it's own internal tracking. It's only doing > > this to find the end of the capability chain, which we could do in a > > spec complaint way by looking for a zero next pointer. Fix that and > > then vfio doesn't need to do this set to 0xffff then back to zero > > nonsense at all. Capability ID zero is valid. Thanks, > > Yeah I see Michael's fix on the capability list stuff. However, imho > these are two different issues? Or say, even if with that patch, we > should still need this hack (first 0x0, then 0xffff) right? Since > looks like that patch didn't solve the problem if the first pcie ecap > is masked at 0x100. I thought the problem was that QEMU in the host exposes a device with a capability ID of 0 to the L1 guest. QEMU in the L1 guest balks at a capability ID of 0 because that's how it finds the end of the chain. Therefore if we make QEMU not use capability ID 0 for internal purposes, things work. vfio using 0xffff and swapping back to 0x0 becomes unnecessary, but doesn't hurt anything. Thanks, Alex