Currently the pseries machine type uses two types of PCI Host Bridge (PHB) devices: "spapr-pci-host-bridge" the 'normal' variant intended for emulated PCI devices, and "spapr-pci-vfio-host-bridge" intended for VFIO devices.
When using VFIO with pseries, a separate spapr-pci-vfio-host-bridge device is needed for every host IOMMU group from which you're using VFIO devices. This is quite awkward for the user and/or management tools. It's especially awkward since the current code makes essentially no attempt to detect and warn the user if the wrong sorts of devices are connected to the wrong PHB. It turns out that the VFIO core code is actually general enough that VFIO devices almost work on the normal spapr-pci-host-bridge device. In fact with the right combination of circumstances they *can* work right now. spapr-pci-vfio-host-bridge does 3 additional things: 1) It disables KVM acceleration of the guest IOMMU. That acceleration breaks VFIO because it means guest IOMMU updates bypass the VFIO infrastructure which keeps the host IOMMU in sync. 2) It automatically configures the guest PHB's DMA window to match the capabilities of the host IOMMU, and advertises that to the guest. 3) It provides additional handling of EEH (Enhanced Error Handling) functions. This patch series: * Allows VFIO devices to be used on the spapr-pci-host-bridge by auto-switching the KVM TCE acceleration * Adds verification that the host IOMMU can handle the DMA windows used by guest PHBs * Allows the DMA window on the guest PHB to be configured with device properties. This can be used to make sure it matches a host window, but in practice the default setting will already work with the host IOMMU on all current systems. * Adds support to the VFIO core to allow a VFIO device to be hotplugged onto a bus which doesn't yet have VFIO devices. This already worked for systems without a guest-visible IOMMU (i.e. x86), this series makes it work even with a guest visible IOMMU. * Makes a few related cleanups along the way This series does NOT allow EEH operations on VFIO devices on the spapr-pci-host-bridge device, so the spapr-pci-vfio-host-bridge device is left in for now. It turns out there are some serious existing problems in both the qemu EEH implementation and (worse) in the EEH/VFIO kernel interface. Fixing those is a problem for another day. Maybe tomorrow. I've tested basic assignment of an xHCI to a pseries guest, both at startup and with hotplug. I haven't (yet) tested VFIO on x86 with this series. This series probably needs to be merged via several different trees. I'm intending to split up as necessary once it's had some review. David Gibson (10): vfio: Remove unneeded union from VFIOContainer vfio: Generalize vfio_listener_region_add failure path vfio: Check guest IOVA ranges against host IOMMU capabilities vfio: Record host IOMMU's available IO page sizes memory: Allow replay of IOMMU mapping notifications vfio: Allow hotplug of containers onto existing guest IOMMU mappings spapr_pci: Allow PCI host bridge DMA window to be configured spapr_iommu: Rename vfio_accel parameter spapr_iommu: Provide a function to switch a TCE table to allowing VFIO spapr_pci: Allow VFIO devices to work on the normal PCI host bridge hw/ppc/spapr_iommu.c | 25 ++++++- hw/ppc/spapr_pci.c | 13 +++- hw/vfio/common.c | 152 +++++++++++++++++++++++++++--------------- include/exec/memory.h | 16 +++++ include/hw/pci-host/spapr.h | 3 +- include/hw/ppc/spapr.h | 6 +- include/hw/vfio/vfio-common.h | 21 +++--- memory.c | 18 +++++ target-ppc/kvm.c | 4 +- target-ppc/kvm_ppc.h | 2 +- 10 files changed, 184 insertions(+), 76 deletions(-) -- 2.4.3