Hi, On 3/11/25 3:10 PM, Shameer Kolothum wrote: > Hi All, > > This patch series introduces initial support for a user-creatable > accelerated SMMUv3 device (-device arm-smmuv3-accel) in QEMU. > > Why this is needed: > > Currently, QEMU’s ARM SMMUv3 emulation (iommu=smmuv3) is tied to the > machine and does not support configuring the host SMMUv3 in nested > mode.This limitation prevents its use with vfio-pci passthrough > devices. > > The new pluggable smmuv3-accel device enables host SMMUv3 configuration > with nested stage support (Stage 1 owned by the Guest and Stage 2 by the > host) via the new IOMMUFD APIs. Additionally, it allows multiple > accelerated vSMMUv3 instances for guests running on hosts with multiple > physical SMMUv3s. > > This will benefit in: > -Reduced invalidation broadcasts and lookups for devices behind multiple > physical SMMUv3s. > -Simplifies handling of host SMMUv3s with differing feature sets. > -Lays the groundwork for additional capabilities like vCMDQ support. > > > Changes from RFCv1[0]: > > Thanks to everyone who provided feedback on RFCv1!. > > –The device is now called arm-smmuv3-accel instead of arm-smmuv3-nested > to better reflect its role in using the host's physical SMMUv3 for page > table setup and cache invalidations. > -Includes patches for VIOMMU and VDEVICE IOMMUFD APIs (patches 1,2). > -Merges patches from Nicolin’s GitHub repository that add accelerated > functionalityi for page table setup and cache invalidations[1]. I have > modified these a bit, but hopefully has not broken anything. > -Incorporates various fixes and improvements based on RFCv1 feedback. > –Adds support for vfio-pci hotplug with smmuv3-accel. > > Note: IORT RMR patches for MSI setup are currently excluded as we may > adopt a different approach for MSI handling in the future [2]. > > Also this has dependency on the common iommufd/vfio patches from > Zhenzhong's series here[3] > > ToDos: > > –At least one vfio-pci device must currently be cold-plugged to a > pxb-pcie bus associated with the arm-smmuv3-accel. This is required both > to associate a vSMMUv3 with a host SMMUv3 and also needed to > retrieve the host SMMUv3 IDR registers for guest export. > Future updates will remove this restriction by adding the > necessary kernel support. > Please find the discussion here[4] > -This version does not yet support host SMMUv3 fault handling or > other event notifications. These will be addressed in a > future patch series. > > > The complete branch can be found here: > https://github.com/hisilicon/qemu/tree/master-smmuv3-accel-rfcv2-ext > > I have done basic sanity testing on a Hisilicon Platform using the kernel > branch here: > https://github.com/nicolinc/iommufd/tree/iommufd_msi-rfcv2 > > Usage Eg: > > On a HiSilicon platform that has multiple host SMMUv3s, the ACC ZIP VF > devices and HNS VF devices are behind different host SMMUv3s. So for a > Guest, specify two arm-smmuv3-accel devices each behind a pxb-pcie as below, > > > ./qemu-system-aarch64 -machine virt,accel=kvm,gic-version=3 \ > -cpu host -smp cpus=4 -m size=4G,slots=4,maxmem=256G \ > -bios QEMU_EFI.fd \ > -object iommufd,id=iommufd0 \ > -device virtio-blk-device,drive=fs \ > -drive if=none,file=rootfs.qcow2,id=fs \ > -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0 \ > -device arm-smmuv3-accel,bus=pcie.1 \ > -device > pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1,pref64-reserve=2M,io-reserve=1K > \ > -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \ > -device > pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=2,pref64-reserve=2M,io-reserve=1K > \ > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port2,iommufd=iommufd0 \ > -device pxb-pcie,id=pcie.2,bus_nr=8,bus=pcie.0 \ > -device arm-smmuv3-accel,bus=pcie.2 \ > -device > pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3,pref64-reserve=2M,io-reserve=1K > \ > -device vfio-pci,host=0000:75:00.1,bus=pcie.port3,iommufd=iommufd0 \ > -kernel Image \ > -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw > earlycon=pl011,0x9000000" \ > -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie.0 \ > -fsdev local,id=p9fs,path=p9root,security_model=mapped \ > -net none \ > -nographic > > Guest will boot with two SMMUv3s, > ... > arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0 > arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features 0x00008325) > arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq > arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq > arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0 > arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features 0x00008325) > arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq > arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq > > With a pci topology like below, > > [root@localhost ~]# lspci -tv > -+-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge > | +-01.0 Red Hat, Inc. QEMU PCIe Expander bridge > | +-02.0 Red Hat, Inc. QEMU PCIe Expander bridge > | \-03.0 Virtio: Virtio filesystem > +-[0000:01]-+-00.0-[02]----00.0 Huawei Technologies Co., Ltd. HNS Network > Controller (Virtual Function) > | \-01.0-[03]----00.0 Huawei Technologies Co., Ltd. HNS Network > Controller (Virtual Function) > \-[0000:08]---00.0-[09]----00.0 Huawei Technologies Co., Ltd. HiSilicon ZIP > Engine(Virtual Function)
For the record I tested the series with host VFIO device and a virtio-blk-pci device put behind the same pxb-pcie/smmu protection and it works just fine -+-[0000:0a]-+-01.0-[0b]----00.0 Mellanox Technologies ConnectX Family mlx5Gen Virtual Function | \-01.1-[0c]----00.0 Red Hat, Inc. Virtio 1.0 block device \-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge +-01.0-[01]-- +-01.1-[02]-- \-02.0 Red Hat, Inc. QEMU PCIe Expander bridge This shows that without vcmdq feature there is no blocker having the same smmu device protecting both accelerated and emulated devices. Thanks Eric > > Further tests are always welcome. > > Please take a look and let me know your feedback! > > Thanks, > Shameer > > [0] > https://lore.kernel.org/qemu-devel/20241108125242.60136-1-shameerali.kolothum.th...@huawei.com/ > [1] > https://github.com/nicolinc/qemu/commit/3acbb7f3d114d6bb70f4895aa66a9ec28e6561d6 > [2] > https://lore.kernel.org/linux-iommu/cover.1740014950.git.nicol...@nvidia.com/ > [3] > https://lore.kernel.org/qemu-devel/20250219082228.3303163-1-zhenzhong.d...@intel.com/ > [4] https://lore.kernel.org/qemu-devel/z6tlsdwgajmhv...@redhat.com/ > > Nicolin Chen (11): > backends/iommufd: Introduce iommufd_backend_alloc_viommu > backends/iommufd: Introduce iommufd_vdev_alloc > hw/arm/smmuv3-accel: Add set/unset_iommu_device callback > hw/arm/smmuv3-accel: Support nested STE install/uninstall support > hw/arm/smmuv3-accel: Allocate a vDEVICE object for device > hw/arm/smmuv3-accel: Return sysmem if stage-1 is bypassed > hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache > invalidations > hw/arm/smmuv3: Forward invalidation commands to hw > hw/arm/smmuv3-accel: Read host SMMUv3 device info > hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD > hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3 > > Shameer Kolothum (9): > hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel > device > hw/arm/virt: Add support for smmuv3-accel > hw/arm/smmuv3-accel: Associate a pxb-pcie bus > hw/arm/smmu-common: Factor out common helper functions and export > hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps > hw/arm/smmuv3-accel: Provide get_address_space callback > hw/arm/smmuv3: Install nested ste for CFGI_STE > hw/arm/virt-acpi-build: Update IORT with multiple smmuv3-accel nodes > hw/arm/smmuv3-accel: Enable smmuv3-accel creation > > backends/iommufd.c | 51 +++ > backends/trace-events | 2 + > hw/arm/Kconfig | 5 + > hw/arm/meson.build | 1 + > hw/arm/smmu-common.c | 95 +++++- > hw/arm/smmuv3-accel.c | 616 ++++++++++++++++++++++++++++++++++ > hw/arm/smmuv3-internal.h | 54 +++ > hw/arm/smmuv3.c | 80 ++++- > hw/arm/trace-events | 6 + > hw/arm/virt-acpi-build.c | 113 ++++++- > hw/arm/virt.c | 12 + > hw/core/sysbus-fdt.c | 1 + > include/hw/arm/smmu-common.h | 14 + > include/hw/arm/smmuv3-accel.h | 75 +++++ > include/hw/arm/virt.h | 1 + > include/system/iommufd.h | 14 + > 16 files changed, 1101 insertions(+), 39 deletions(-) > create mode 100644 hw/arm/smmuv3-accel.c > create mode 100644 include/hw/arm/smmuv3-accel.h >