Hi Alex, On 3/26/19 11:55 PM, Alex Williamson wrote: > Conventional PCI buses pre-date requester IDs. An IOMMU cannot > distinguish by devfn & bus between devices in a conventional PCI > topology and therefore we cannot assign them separate AddressSpaces. > By taking this requester ID aliasing into account, QEMU better matches > the bare metal behavior and restrictions, and enables shared > AddressSpace configurations that are otherwise not possible with > guest IOMMU support. > > For the latter case, given any example where an IOMMU group on the > host includes multiple devices: > > $ ls /sys/kernel/iommu_groups/1/devices/ > 0000:00:01.0 0000:01:00.0 0000:01:00.1 > > If we incorporate a vIOMMU into the VM configuration, we're restricted > that we can only assign one of the endpoints to the guest because a > second endpoint will attempt to use a different AddressSpace. VFIO > only supports IOMMU group level granularity at the container level, > preventing this second endpoint from being assigned: > > qemu-system-x86_64 -machine q35... \ > -device intel-iommu,intremap=on \ > -device pcie-root-port,addr=1e.0,id=pcie.1 \ > -device vfio-pci,host=1:00.0,bus=pcie.1,addr=0.0,multifunction=on \ > -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1 > > qemu-system-x86_64: -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1: vfio \ > 0000:01:00.1: group 1 used in multiple address spaces > > However, when QEMU incorporates proper aliasing, we can make use of a > PCIe-to-PCI bridge to mask the requester ID, resulting in a hack that > provides the downstream devices with the same AddressSpace, ex: > > qemu-system-x86_64 -machine q35... \ > -device intel-iommu,intremap=on \ > -device pcie-pci-bridge,addr=1e.0,id=pci.1 \ > -device vfio-pci,host=1:00.0,bus=pci.1,addr=1.0,multifunction=on \ > -device vfio-pci,host=1:00.1,bus=pci.1,addr=1.1 > > While the utility of this hack may be limited, this AddressSpace > aliasing is the correct behavior for QEMU to emulate bare metal. > > Signed-off-by: Alex Williamson <alex.william...@redhat.com> > --- > hw/pci/pci.c | 33 +++++++++++++++++++++++++++++++-- > 1 file changed, 31 insertions(+), 2 deletions(-) > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c > index 35451c1e9987..38467e676f1f 100644 > --- a/hw/pci/pci.c > +++ b/hw/pci/pci.c > @@ -2594,12 +2594,41 @@ AddressSpace > *pci_device_iommu_address_space(PCIDevice *dev) > { > PCIBus *bus = pci_get_bus(dev); > PCIBus *iommu_bus = bus; > + uint8_t devfn = dev->devfn; > > while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) { > - iommu_bus = pci_get_bus(iommu_bus->parent_dev); > + PCIBus *parent_bus = pci_get_bus(iommu_bus->parent_dev); > + > + /* > + * Determine which requester ID alias should be used for the device > + * based on the PCI topology. There are no requester IDs on > convetional conventional > + * PCI buses, therefore we push the alias up to the parent on each > non- > + * express bus. Which alias we use depends on whether this is a > legacy > + * PCI bridge or PCIe-to-PCI/X bridge as in chapter 2.3 of the > PCIe-to- > + * PCI bridge spec. Note that we cannot use pci_requester_id() here > + * because the resulting BDF depends on the secondary bridge register Didn't you mean secondary bus number register? > + * programming. We also cannot lookup the PCIBus from the bus number > + * at this point for the iommu_fn. Also, requester_id_cache is the > + * alias to the root bus, which is usually, but not necessarily > always > + * where we'll find our iommu_fn. > + */ > + if (!pci_bus_is_express(iommu_bus)) { > + PCIDevice *parent = iommu_bus->parent_dev; > + > + if (pci_is_express(parent) && > + pcie_cap_get_type(parent) == PCI_EXP_TYPE_PCI_BRIDGE) { > + devfn = PCI_DEVFN(0, 0); > + bus = iommu_bus; > + } else { > + devfn = parent->devfn; > + bus = parent_bus; > + } > + } > + > + iommu_bus = parent_bus; > } > if (iommu_bus && iommu_bus->iommu_fn) { > - return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, dev->devfn); > + return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn); I think it would make sense to comment this iommu_fn() callback's role somewhere. > } > return &address_space_memory; > } > >
Thanks Eric