Hi Zhenzhong, On 7/16/25 12:31 PM, Duan, Zhenzhong wrote: > Hi Eric, > >> -----Original Message----- >> From: Eric Auger <eric.au...@redhat.com> >> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with >> IOMMUFD backed device when x-flts=on >> >> Hi Zhenzhong, >> >> On 7/8/25 1:05 PM, Zhenzhong Duan wrote: >>> When vIOMMU is configured x-flts=on in scalable mode, stage-1 page table >>> is passed to host to construct nested page table. We need to check >>> compatibility of some critical IOMMU capabilities between vIOMMU and >>> host IOMMU to ensure guest stage-1 page table could be used by host. >>> >>> For instance, vIOMMU supports stage-1 1GB huge page mapping, but host >>> does not, then this IOMMUFD backed device should fail. >>> >>> Even of the checks pass, for now we willingly reject the association >>> because all the bits are not there yet. >>> >>> Signed-off-by: Yi Liu <yi.l....@intel.com> >>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com> >>> --- >>> hw/i386/intel_iommu.c | 30 >> +++++++++++++++++++++++++++++- >>> hw/i386/intel_iommu_internal.h | 1 + >>> 2 files changed, 30 insertions(+), 1 deletion(-) >>> >>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c >>> index e90fd2f28f..c57ca02cdd 100644 >>> --- a/hw/i386/intel_iommu.c >>> +++ b/hw/i386/intel_iommu.c >>> @@ -40,6 +40,7 @@ >>> #include "kvm/kvm_i386.h" >>> #include "migration/vmstate.h" >>> #include "trace.h" >>> +#include "system/iommufd.h" >>> >>> /* context entry operations */ >>> #define VTD_CE_GET_RID2PASID(ce) \ >>> @@ -4355,7 +4356,34 @@ static bool vtd_check_hiod(IntelIOMMUState *s, >> HostIOMMUDevice *hiod, >>> return true; >>> } >>> >>> - error_setg(errp, "host device is uncompatible with stage-1 >> translation"); >>> +#ifdef CONFIG_IOMMUFD >>> + struct HostIOMMUDeviceCaps *caps = &hiod->caps; >>> + struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd; >> I am now confused about how this relates to vtd_get_viommu_cap(). >> PCIIOMMUOps.set_iommu_device = vtd_dev_set_iommu_device calls >> vtd_check_hiod() >> viommu might return HW_NESTED_CAP through >> PCIIOMMUOps.get_viommu_cap >> without making sure the underlying HW IOMMU does support it. Is that a >> correct understanding? Maybe we should clarify the calling order between >> set_iommu_device vs get_viommu_cap? Could we check HW IOMMU >> prerequisites in vtd_get_viommu_cap() by enforcing this is called after >> set_iommu_device. I think we should clarify the exact semantic of >> get_viommu_cap().Thanks Eric > My understanding get_viommu_cap() returns pure vIOMMU's capabilities > with no host IOMMU's capabilities involved. > > For example, returned HW_NESTED_CAP means this vIOMMU has code > to support creating nested hwpt and attaching, no matter if host IOMMU > supports nesting or not.
Then I think you need to refine the description in 2/20 to make this clear. stating explicitly get_viommu_cap returns theoretical capabilities which are independent on the actual host capabilities they may depend on. > > The compatibility check between host IOMMU vs vIOMMU is done in > set_iommu_device(), see vtd_check_hiod(). > > It's too late for VFIO to call get_viommu_cap() after set_iommu_device() > because we need get_viommu_cap() to determine if creating nested parent > hwpt or not at attaching stage. isn't it possible to rework the call sequence? Eric > > If host doesn't support nesting, then creating nested parent hwpt will fail > early in vfio realize before vtd_check_hiod() is called and we fail the > hotplug. > > Thanks > Zhenzhong