Hi Eric, > -----Original Message----- > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: 03 April 2020 11:45 > To: Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com>; > eric.auger....@gmail.com; qemu-devel@nongnu.org; qemu-...@nongnu.org; > peter.mayd...@linaro.org; m...@redhat.com; alex.william...@redhat.com; > jacob.jun....@linux.intel.com; yi.l....@intel.com > Cc: pet...@redhat.com; jean-phili...@linaro.org; w...@kernel.org; > tnowi...@marvell.com; zhangfei....@foxmail.com; zhangfei....@linaro.org; > m...@kernel.org; bbhush...@marvell.com > Subject: Re: [RFC v6 00/24] vSMMUv3/pSMMUv3 2 stage VFIO integration > > Hi Shameer, > > On 3/25/20 12:35 PM, Shameerali Kolothum Thodi wrote: > > Hi Eric, > > > >> -----Original Message----- > >> From: Eric Auger [mailto:eric.au...@redhat.com] > >> Sent: 20 March 2020 16:58 > >> To: eric.auger....@gmail.com; eric.au...@redhat.com; > >> qemu-devel@nongnu.org; qemu-...@nongnu.org; > peter.mayd...@linaro.org; > >> m...@redhat.com; alex.william...@redhat.com; > >> jacob.jun....@linux.intel.com; yi.l....@intel.com > >> Cc: pet...@redhat.com; jean-phili...@linaro.org; w...@kernel.org; > >> tnowi...@marvell.com; Shameerali Kolothum Thodi > >> <shameerali.kolothum.th...@huawei.com>; zhangfei....@foxmail.com; > >> zhangfei....@linaro.org; m...@kernel.org; bbhush...@marvell.com > >> Subject: [RFC v6 00/24] vSMMUv3/pSMMUv3 2 stage VFIO integration > >> > >> Up to now vSMMUv3 has not been integrated with VFIO. VFIO > >> integration requires to program the physical IOMMU consistently > >> with the guest mappings. However, as opposed to VTD, SMMUv3 has > >> no "Caching Mode" which allows easy trapping of guest mappings. > >> This means the vSMMUV3 cannot use the same VFIO integration as VTD. > >> > >> However SMMUv3 has 2 translation stages. This was devised with > >> virtualization use case in mind where stage 1 is "owned" by the > >> guest whereas the host uses stage 2 for VM isolation. > >> > >> This series sets up this nested translation stage. It only works > >> if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in > >> other words, it does not work if there is a physical SMMUv2). > > > > I was testing this series on one of our hardware board with SMMUv3. I did > > observe an issue while trying to bring up Guest with and without the > vsmmuV3. > > I am currently investigating and up to now I fail to reproduce on my end. > > > > Steps are like below, > > > > 1. start a guest with "iommu=smmuv3" and a n/w vf device. > > > > 2.Exit the VM. > how to you exit the VM?
QMP system_powerdown > > > > 3. start the guest again without "iommu=smmuv3" > > > > This time qemu crashes with, > > > > [ 0.447830] hns3 0000:00:01.0: enabling device (0000 -> 0002) > > > /home/shameer/qemu-eric/qemu/hw/vfio/pci.c:2851:vfio_dma_fault_notifier_ > handler: > > Object 0xaaaaeeb47c00 is not an instance of type > So I think I understand the qemu crash. At the moment the vfio_pci > registers a fault handler even if we are not in nested mode. The smmuv3 > host driver calls any registered fault handler when it encounters an > error in !nested mode. So the eventfd is triggered to userspace but qemu > does not expect that. However the root case is we got some physical > faults on the second run. True. And qemu works fine if I run again with iommu=smmuv3 option. That's why I suspect the mapping for the device in the phys smmu is not cleared and on vfio-pci enable dev path it encounters error ? > > qemu:iommu-memory-region > > ./qemu_run-vsmmu-hns: line 9: 13609 Aborted (core > > dumped) ./qemu-system-aarch64-vsmmuv3v10 -machine > > virt,kernel_irqchip=on,gic-version=3 -cpu host -smp cpus=1 -kernel > > Image-ericv10-uacce -initrd rootfs-iperf.cpio -bios > Just to double check with you, > host: will-arm-smmu-updates-2stage-v10 > qemu: v4.2.0-2stage-rfcv6 > guest version? Yes. And guest = host image. > > QEMU_EFI_Dec2018.fd -device vfio-pci,host=0000:7d:02.1 -net none -m > Do you assign exactly the same VF as during the 1st run? Yes same. Only change is "iommu=smmuv3" omission. > > 4096 -nographic -D -d -enable-kvm -append "console=ttyAMA0 > > root=/dev/vda -m 4096 rw earlycon=pl011,0x9000000" > > > > And you can see that host kernel receives smmuv3 C_BAD_STE event, > > > > [10499.379288] vfio-pci 0000:7d:02.1: enabling device (0000 -> 0002) > > [10501.943881] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x04 received: > > [10501.943884] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1100000004 > > [10501.943886] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000100800000080 > > [10501.943887] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fe040000 > > [10501.943889] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000000007e04c440 > I will try to prepare a kernel branch with additional traces. Ok. You can find the qemu traces below (vfio*/smmu*) for with and without iommu=smmuv3 runs(may be not that useful). https://github.com/hisilicon/qemu/tree/v4.2.0-2stage-rfcv6-eric/traces Thanks, Shameer > Thanks > > Eric > > > > So I suspect we didn't clear nested stage configuration and that affects the > > translation in the second run. I tried to issue(force) a > vfio_detach_pasid_table() but > > that didn't solve the problem. > > > > May be I am missing something. Could you please take a look and let me > know. > > > > Thanks, > > Shameer > > > >> - We force the host to use stage 2 instead of stage 1, when we > >> detect a vSMMUV3 is behind a VFIO device. For a VFIO device > >> without any virtual IOMMU, we still use stage 1 as many existing > >> SMMUs expect this behavior. > >> - We use PCIPASIDOps to propage guest stage1 config changes on > >> STE (Stream Table Entry) changes. > >> - We implement a specific UNMAP notifier that conveys guest > >> IOTLB invalidations to the host > >> - We register MSI IOVA/GPA bindings to the host so that this latter > >> can build a nested stage translation > >> - As the legacy MAP notifier is not called anymore, we must make > >> sure stage 2 mappings are set. This is achieved through another > >> prereg memory listener. > >> - Physical SMMU stage 1 related faults are reported to the guest > >> via en eventfd mechanism and exposed trhough a dedicated VFIO-PCI > >> region. Then they are reinjected into the guest. > >> > >> Best Regards > >> > >> Eric > >> > >> This series can be found at: > >> https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6 > >> > >> Kernel Dependencies: > >> [1] [PATCH v10 00/11] SMMUv3 Nested Stage Setup (VFIO part) > >> [2] [PATCH v10 00/13] SMMUv3 Nested Stage Setup (IOMMU part) > >> branch at: > >> https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10 > >> > >> History: > >> > >> v5 -> v6: > >> - just rebase work > >> > >> v4 -> v5: > >> - Use PCIPASIDOps for config update notifications > >> - removal of notification for MSI binding which is not needed > >> anymore > >> - Use a single fault region > >> - use the specific interrupt index > >> > >> v3 -> v4: > >> - adapt to changes in uapi (asid cache invalidation) > >> - check VFIO_PCI_DMA_FAULT_IRQ_INDEX is supported at kernel level > >> before attempting to set signaling for it. > >> - sync on 5.2-rc1 kernel headers + Drew's patch that imports sve_context.h > >> - fix MSI binding for MSI (not MSIX) > >> - fix mingw compilation > >> > >> v2 -> v3: > >> - rework fault handling > >> - MSI binding registration done in vfio-pci. MSI binding tear down called > >> on container cleanup path > >> - leaf parameter propagated > >> > >> v1 -> v2: > >> - Fixed dual assignment (asid now correctly propagated on TLB > >> invalidations) > >> - Integrated fault reporting > >> > >> > >> Eric Auger (23): > >> update-linux-headers: Import iommu.h > >> header update against 5.6.0-rc3 and IOMMU/VFIO nested stage APIs > >> memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region > >> attribute > >> memory: Add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region > >> attribute > >> memory: Introduce IOMMU Memory Region inject_faults API > >> memory: Add arch_id and leaf fields in IOTLBEntry > >> iommu: Introduce generic header > >> vfio: Force nested if iommu requires it > >> vfio: Introduce hostwin_from_range helper > >> vfio: Introduce helpers to DMA map/unmap a RAM section > >> vfio: Set up nested stage mappings > >> vfio: Pass stage 1 MSI bindings to the host > >> vfio: Helper to get IRQ info including capabilities > >> vfio/pci: Register handler for iommu fault > >> vfio/pci: Set up the DMA FAULT region > >> vfio/pci: Implement the DMA fault handler > >> hw/arm/smmuv3: Advertise MSI_TRANSLATE attribute > >> hw/arm/smmuv3: Store the PASID table GPA in the translation config > >> hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation > >> hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation > >> hw/arm/smmuv3: Pass stage 1 configurations to the host > >> hw/arm/smmuv3: Implement fault injection > >> hw/arm/smmuv3: Allow MAP notifiers > >> > >> Liu Yi L (1): > >> pci: introduce PCIPASIDOps to PCIDevice > >> > >> hw/arm/smmuv3.c | 189 ++++++++++-- > >> hw/arm/trace-events | 3 +- > >> hw/pci/pci.c | 34 +++ > >> hw/vfio/common.c | 506 > >> +++++++++++++++++++++++++------- > >> hw/vfio/pci.c | 267 ++++++++++++++++- > >> hw/vfio/pci.h | 9 + > >> hw/vfio/trace-events | 9 +- > >> include/exec/memory.h | 49 +++- > >> include/hw/arm/smmu-common.h | 1 + > >> include/hw/iommu/iommu.h | 28 ++ > >> include/hw/pci/pci.h | 11 + > >> include/hw/vfio/vfio-common.h | 16 + > >> linux-headers/COPYING | 2 + > >> linux-headers/asm-x86/kvm.h | 1 + > >> linux-headers/linux/iommu.h | 375 +++++++++++++++++++++++ > >> linux-headers/linux/vfio.h | 109 ++++++- > >> memory.c | 10 + > >> scripts/update-linux-headers.sh | 2 +- > >> 18 files changed, 1478 insertions(+), 143 deletions(-) > >> create mode 100644 include/hw/iommu/iommu.h > >> create mode 100644 linux-headers/linux/iommu.h > >> > >> -- > >> 2.20.1 > >