This series adds support for guests using the AMD vIOMMU to enable DMA
remapping for VFIO devices. In addition to the currently supported
passthrough (PT) mode, guest kernels are now able to to provide DMA
address translation and access permission checking to VFs attached to
paging domains, using the AMD v1 I/O page table format.
These changes provide the essential emulation required to boot and
support regular operation for a Linux guest enabling DMA remapping e.g.
via kernel parameters "iommu=nopt" or "iommu.passthrough=0".
A new amd-iommu device property "dma-remap" (default: off) is introduced
to control whether the feature is available. See below for a full
example of QEMU cmdline parameters used in testing.
The patchset has been tested on an AMD EPYC Genoa host, with Linux 6.14
host and guest kernels, launching guests with up to 256 vCPUs, 512G
memory, and 16 CX6 VFs. Testing with IOMMU x2apic support enabled (i.e.
xtsup=on) requires fix:
https://lore.kernel.org/all/20250410064447.29583-3-sarun...@amd.com/
Although there is more work to do, I am sending this series as a patch
and not an RFC since it provides a working implementation of the
feature. With this basic infrastructure in place it becomes easier to
add/verify enhancements and new functionality. Here are some items I am
working to address in follow up patches:
- Page Fault and error reporting
- Add QEMU tracing and tests
- Provide control over VA Size advertised to guests
- Support hotplug/unplug of devices and other advanced features
(suggestions welcomed)
Thank you,
Alejandro
---
Example QEMU command line:
$QEMU \
-nodefaults \
-snapshot \
-no-user-config \
-display none \
-serial mon:stdio -nographic \
-machine q35,accel=kvm,kernel_irqchip=split \
-cpu host,+topoext,+x2apic,-svm,-vmx,-kvm-msi-ext-dest-id \
-smp 32 \
-m 128G \
-kernel $KERNEL \
-initrd $INITRD \
-append "console=tty0 console=ttyS0 root=/dev/mapper/ol-root ro
rd.lvm.lv=ol/root rd.lvm.lv=ol/swap iommu.passthrough=0" \
-device amd-iommu,intremap=on,xtsup=on,dma-remap=on \
-blockdev node-
name=drive0,driver=qcow2,file.driver=file,file.filename=./OracleLinux-
uefi-x86_64.qcow2 \
-device virtio-blk-pci,drive=drive0,id=virtio-disk0 \
-drive if=pflash,format=raw,unit=0,file=/usr/share/edk2/ovmf/
OVMF_CODE.fd,readonly=on \
-drive if=pflash,format=raw,unit=1,file=./OVMF_VARS.fd \
-device vfio-pci,host=0000:a1:00.1,id=net0
---
Alejandro Jimenez (18):
memory: Adjust event ranges to fit within notifier boundaries
amd_iommu: Add helper function to extract the DTE
amd_iommu: Add support for IOMMU notifier
amd_iommu: Unmap all address spaces under the AMD IOMMU on reset
amd_iommu: Toggle memory regions based on address translation mode
amd_iommu: Set all address spaces to default translation mode on reset
amd_iommu: Return an error when unable to read PTE from guest memory
amd_iommu: Helper to decode size of page invalidation command
amd_iommu: Add helpers to walk AMD v1 Page Table format
amd_iommu: Add a page walker to sync shadow page tables on
invalidation
amd_iommu: Sync shadow page tables on page invalidation
amd_iommu: Add replay callback
amd_iommu: Invalidate address translations on INVALIDATE_IOMMU_ALL
amd_iommu: Toggle address translation on device table entry
invalidation
amd_iommu: Use iova_tree records to determine large page size on UNMAP
amd_iommu: Do not assume passthrough translation when DTE[TV]=0
amd_iommu: Refactor amdvi_page_walk() to use common code for page walk
amd_iommu: Do not emit I/O page fault events during replay()
hw/i386/amd_iommu.c | 856 ++++++++++++++++++++++++++++++++++++++++----
hw/i386/amd_iommu.h | 52 +++
system/memory.c | 10 +-
3 files changed, 843 insertions(+), 75 deletions(-)
base-commit: 56c6e249b6988c1b6edc2dd34ebb0f1e570a1365