On Mon, Jul 14, 2025 at 04:59:32PM +0100, Shameer Kolothum wrote:
> Accelerated SMMUv3 is only useful when the device can take advantage of
> the host's SMMUv3 in nested mode. To keep things simple and correct, we
> only allow this feature for vfio-pci endpoint devices that use the iommufd
> backend. We also allow non-endpoint emulated devices like PCI bridges and
> root ports, so that users can plug in these vfio-pci devices.
> 
> Another reason for this limit is to avoid problems with IOTLB
> invalidations. Some commands (e.g., CMD_TLBI_NH_ASID) lack an associated
> SID, making it difficult to trace the originating device. If we allowed
> emulated endpoint devices, QEMU would have to invalidate both its own
> software IOTLB and the host's hardware IOTLB, which could slow things
> down.

Change looks good to me,

Reviewed-by: Nicolin Chen <nicol...@nvidia.com>

Some nits:

> Since vfio-pci devices in nested mode rely on the host SMMUv3's nested
> translation (S1+S2), their get_address_space() callback must return the
> system address space to enable correct S2 mappings of guest RAM.
>
> So in short:
>  - vfio-pci devices return the system address space
>  - bridges and root ports return the IOMMU address space

I think we can spare a few more words here and in the code too:
(a) A vfio-pci device in an accelerated mode doesn't need any of
    the features from the iommu address space, since translation
    and IOTLB maintenance will be handled by the real hardware.
(b) In the HW accelerated mode, the VFIO core will allocate the
    S2 nesting parent HWPT on top of a core-managed IOAS for S2
    mappings. So, returning the system address space allows to
    take advantage of that.
(c) The reason why bridges and root ports can't use the system
    address space.

Feel free to organize these in your preferred words.

> Note: On ARM, MSI doorbell addresses are also translated via SMMUv3.
> Hence, if a vfio-pci device is behind the SMMuv3 with translation enabled,

s/SMMuv3/SMMUv3
  
> +static bool smmuv3_accel_pdev_allowed(PCIDevice *pdev, bool *vfio_pci)
> +{
> +
> +    if (object_dynamic_cast(OBJECT(pdev), TYPE_PCI_BRIDGE) ||
> +        object_dynamic_cast(OBJECT(pdev), "pxb-pcie") ||

"TYPE_PXB_PCIE_DEV" since we moved it to the header in this patch.

Reply via email to