Hi,
On 11/3/25 15:10, Shameer Kolothum via wrote:
Hi All,
This patch series introduces initial support for a user-creatable
accelerated SMMUv3 device (-device arm-smmuv3-accel) in QEMU.
I'm a bit confused by the design here. Why are we introducing this as
some device while it is a core component of the bus topology (here PCI)?
Is is because this device is inspired on how x86 IOMMUs are wired?
Why this is needed:
Currently, QEMU’s ARM SMMUv3 emulation (iommu=smmuv3) is tied to the
machine and does not support configuring the host SMMUv3 in nested
mode.This limitation prevents its use with vfio-pci passthrough
devices.
The new pluggable smmuv3-accel device enables host SMMUv3 configuration
with nested stage support (Stage 1 owned by the Guest and Stage 2 by the
host) via the new IOMMUFD APIs. Additionally, it allows multiple
accelerated vSMMUv3 instances for guests running on hosts with multiple
physical SMMUv3s.
This will benefit in:
-Reduced invalidation broadcasts and lookups for devices behind multiple
physical SMMUv3s.
-Simplifies handling of host SMMUv3s with differing feature sets.
-Lays the groundwork for additional capabilities like vCMDQ support.
Changes from RFCv1[0]:
Thanks to everyone who provided feedback on RFCv1!.
–The device is now called arm-smmuv3-accel instead of arm-smmuv3-nested
to better reflect its role in using the host's physical SMMUv3 for page
table setup and cache invalidations.
-Includes patches for VIOMMU and VDEVICE IOMMUFD APIs (patches 1,2).
-Merges patches from Nicolin’s GitHub repository that add accelerated
functionalityi for page table setup and cache invalidations[1]. I have
modified these a bit, but hopefully has not broken anything.
-Incorporates various fixes and improvements based on RFCv1 feedback.
–Adds support for vfio-pci hotplug with smmuv3-accel.
Note: IORT RMR patches for MSI setup are currently excluded as we may
adopt a different approach for MSI handling in the future [2].
Also this has dependency on the common iommufd/vfio patches from
Zhenzhong's series here[3]
ToDos:
–At least one vfio-pci device must currently be cold-plugged to a
pxb-pcie bus associated with the arm-smmuv3-accel. This is required both
to associate a vSMMUv3 with a host SMMUv3 and also needed to
retrieve the host SMMUv3 IDR registers for guest export.
Future updates will remove this restriction by adding the
necessary kernel support.
Please find the discussion here[4]
-This version does not yet support host SMMUv3 fault handling or
other event notifications. These will be addressed in a
future patch series.
The complete branch can be found here:
https://github.com/hisilicon/qemu/tree/master-smmuv3-accel-rfcv2-ext
I have done basic sanity testing on a Hisilicon Platform using the kernel
branch here:
https://github.com/nicolinc/iommufd/tree/iommufd_msi-rfcv2
Usage Eg:
On a HiSilicon platform that has multiple host SMMUv3s, the ACC ZIP VF
devices and HNS VF devices are behind different host SMMUv3s. So for a
Guest, specify two arm-smmuv3-accel devices each behind a pxb-pcie as below,
./qemu-system-aarch64 -machine virt,accel=kvm,gic-version=3 \
-cpu host -smp cpus=4 -m size=4G,slots=4,maxmem=256G \
-bios QEMU_EFI.fd \
-object iommufd,id=iommufd0 \
-device virtio-blk-device,drive=fs \
-drive if=none,file=rootfs.qcow2,id=fs \
-device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0 \
-device arm-smmuv3-accel,bus=pcie.1 \
-device
pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1,pref64-reserve=2M,io-reserve=1K
\
-device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
-device
pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=2,pref64-reserve=2M,io-reserve=1K
\
-device vfio-pci,host=0000:7d:02.2,bus=pcie.port2,iommufd=iommufd0 \
-device pxb-pcie,id=pcie.2,bus_nr=8,bus=pcie.0 \
-device arm-smmuv3-accel,bus=pcie.2 \
-device
pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3,pref64-reserve=2M,io-reserve=1K
\
-device vfio-pci,host=0000:75:00.1,bus=pcie.port3,iommufd=iommufd0 \
-kernel Image \
-append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw
earlycon=pl011,0x9000000" \
-device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie.0 \
-fsdev local,id=p9fs,path=p9root,security_model=mapped \
-net none \
-nographic
Guest will boot with two SMMUv3s,
...
arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features 0x00008325)
arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features 0x00008325)
arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
With a pci topology like below,
[root@localhost ~]# lspci -tv
-+-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge
| +-01.0 Red Hat, Inc. QEMU PCIe Expander bridge
| +-02.0 Red Hat, Inc. QEMU PCIe Expander bridge
| \-03.0 Virtio: Virtio filesystem
+-[0000:01]-+-00.0-[02]----00.0 Huawei Technologies Co., Ltd. HNS Network
Controller (Virtual Function)
| \-01.0-[03]----00.0 Huawei Technologies Co., Ltd. HNS Network
Controller (Virtual Function)
\-[0000:08]---00.0-[09]----00.0 Huawei Technologies Co., Ltd. HiSilicon ZIP
Engine(Virtual Function)
Further tests are always welcome.
Please take a look and let me know your feedback!
Thanks,
Shameer
[0]
https://lore.kernel.org/qemu-devel/20241108125242.60136-1-shameerali.kolothum.th...@huawei.com/
[1]
https://github.com/nicolinc/qemu/commit/3acbb7f3d114d6bb70f4895aa66a9ec28e6561d6
[2]
https://lore.kernel.org/linux-iommu/cover.1740014950.git.nicol...@nvidia.com/
[3]
https://lore.kernel.org/qemu-devel/20250219082228.3303163-1-zhenzhong.d...@intel.com/
[4] https://lore.kernel.org/qemu-devel/z6tlsdwgajmhv...@redhat.com/