Each Partitionable Endpoint (IOMMU group) has an address range on a PCI bus where devices are allowed to do DMA. These ranges are called DMA windows. By default, there is a single DMA window, 1 or 2GB big, mapped at zero on a PCI bus.
PAPR defines a DDW RTAS API which allows pseries guests querying the hypervisor about DDW support and capabilities (page size mask for now). A pseries guest may request an additional (to the default) DMA windows using this RTAS API. The existing pseries Linux guests request an additional window as big as the guest RAM and map the entire guest window which effectively creates direct mapping of the guest memory to a PCI bus. This patchset reworks PPC64 IOMMU code and adds necessary structures to support big windows. Once a Linux guest discovers the presence of DDW, it does: 1. query hypervisor about number of available windows and page size masks; 2. create a window with the biggest possible page size (today 4K/64K/16M); 3. map the entire guest RAM via H_PUT_TCE* hypercalls; 4. switche dma_ops to direct_dma_ops on the selected PE. Once this is done, H_PUT_TCE is not called anymore for 64bit devices and the guest does not waste time on DMA map/unmap operations. Note that 32bit devices won't use DDW and will keep using the default DMA window so KVM optimizations will be required (to be posted later). This patchset adds DDW support for pseries. The host kernel changes are required, posted as: [PATCH kernel v7 00/31] powerpc/iommu/vfio: Enable Dynamic DMA windows This patchset is based on git://github.com/dgibson/qemu.git spapr-next branch. Please comment. Thanks! Changes: v5: * TCE tables got "enabled" state and are persistent, i.e. not recreated every reboot * added v2 of SPAPR_TCE_IOMMU * fixed migration for emulated PHB with enabled DDW * huge pile of other changes v4: * reimplemented the whole thing * machine reset and ddw-reset RTAS call both remove all TCE tables and create the default one * IOMMU group id is not needed to use VFIO PHB anymore, multiple groups are supported on the same VFIO container and virtual PHB v3: * removed "reset" from API now * reworked machine versions * applied multiple comments * includes David's machine QOM rework as this patchset adds a new machine type v2: * tested on emulated PHB * removed "ddw" machine property, now it is PHB property * disabled by default * defined "pseries-2.2" machine which enables DDW by default * fixed reset() and reference counting Alexey Kardashevskiy (12): linux headers update for DDW on SPAPR vmstate: Define VARRAY with VMS_ALLOC spapr_pci: Make find_phb()/find_dev() public spapr_pci_vfio: Enable multiple groups per container vfio: spapr: Move SPAPR-related code to a separate file vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) spapr_iommu: Rework TCE table initialization spapr_pci: Rework reset to reset DMA configuration spapr_iommu: Add root memory region spapr_pci: Rework finish_realize() spapr_pci: Disable all DMA windows on reset spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) hw/ppc/Makefile.objs | 3 + hw/ppc/spapr.c | 5 + hw/ppc/spapr_iommu.c | 140 +++++++++++++------ hw/ppc/spapr_pci.c | 118 ++++++++++++---- hw/ppc/spapr_pci_vfio.c | 149 ++++++++++++++------ hw/ppc/spapr_rtas_ddw.c | 314 ++++++++++++++++++++++++++++++++++++++++++ hw/ppc/spapr_vio.c | 10 +- hw/vfio/Makefile.objs | 1 + hw/vfio/common.c | 186 +++++-------------------- hw/vfio/spapr.c | 301 ++++++++++++++++++++++++++++++++++++++++ include/hw/pci-host/spapr.h | 21 ++- include/hw/ppc/spapr.h | 31 ++++- include/hw/vfio/vfio-common.h | 16 +++ include/hw/vfio/vfio.h | 2 +- include/migration/vmstate.h | 10 ++ linux-headers/linux/vfio.h | 88 +++++++++++- trace-events | 5 + 17 files changed, 1130 insertions(+), 270 deletions(-) create mode 100644 hw/ppc/spapr_rtas_ddw.c create mode 100644 hw/vfio/spapr.c -- 2.0.0