On Tue, Mar 26, 2019 at 04:55:19PM -0600, Alex Williamson wrote: > Conventional PCI buses pre-date requester IDs. An IOMMU cannot > distinguish by devfn & bus between devices in a conventional PCI > topology and therefore we cannot assign them separate AddressSpaces. > By taking this requester ID aliasing into account, QEMU better matches > the bare metal behavior and restrictions, and enables shared > AddressSpace configurations that are otherwise not possible with > guest IOMMU support. > > For the latter case, given any example where an IOMMU group on the > host includes multiple devices: > > $ ls /sys/kernel/iommu_groups/1/devices/ > 0000:00:01.0 0000:01:00.0 0000:01:00.1
[1] > > If we incorporate a vIOMMU into the VM configuration, we're restricted > that we can only assign one of the endpoints to the guest because a > second endpoint will attempt to use a different AddressSpace. VFIO > only supports IOMMU group level granularity at the container level, > preventing this second endpoint from being assigned: > > qemu-system-x86_64 -machine q35... \ > -device intel-iommu,intremap=on \ > -device pcie-root-port,addr=1e.0,id=pcie.1 \ > -device vfio-pci,host=1:00.0,bus=pcie.1,addr=0.0,multifunction=on \ > -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1 > > qemu-system-x86_64: -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1: vfio \ > 0000:01:00.1: group 1 used in multiple address spaces > > However, when QEMU incorporates proper aliasing, we can make use of a > PCIe-to-PCI bridge to mask the requester ID, resulting in a hack that > provides the downstream devices with the same AddressSpace, ex: > > qemu-system-x86_64 -machine q35... \ > -device intel-iommu,intremap=on \ > -device pcie-pci-bridge,addr=1e.0,id=pci.1 \ > -device vfio-pci,host=1:00.0,bus=pci.1,addr=1.0,multifunction=on \ > -device vfio-pci,host=1:00.1,bus=pci.1,addr=1.1 > > While the utility of this hack may be limited, this AddressSpace > aliasing is the correct behavior for QEMU to emulate bare metal. > > Signed-off-by: Alex Williamson <alex.william...@redhat.com> The patch looks sane to me even as a bug fix since otherwise the DMA address spaces used under misc kinds of PCI bridges can be wrong, so: Reviewed-by: Peter Xu <pet...@redhat.com> Though I have a question that confused me even before: Alex, do you know why all the context entry of the devices in the IOMMU root table will be programmed even if the devices are under a pcie-to-pci bridge? I'm giving an example with above [1] to be clear: in that case IIUC we'll program context entries for all the three devices (00:01.0, 01:00.0, 01:00.1) but they'll point to the same IOMMU table. DMAs of devices 01:00.0 and 01:00.1 should always been tagged with 01:00.0 on bare metal and then why we bother to program the context entry of 01:00.1? It seems never used. (It should be used for current QEMU to work with pcie-to-pci bridges if without this patch, but I feel like I don't know the real answer behind) Thanks, -- Peter Xu