Jason Gunthorpe <j...@nvidia.com> writes: > On Tue, Jun 03, 2025 at 02:20:51PM +0800, Xu Yilun wrote: >> > Wouldn’t it be simpler to skip the reference count increment altogether >> > and just call tsm_unbind in the virtual device’s destroy callback? >> > (iommufd_vdevice_destroy()) >> >> The vdevice refcount is the main concern, there is also an IOMMU_DESTROY >> ioctl. User could just free the vdevice instance if no refcount, while VFIO >> is still in bound state. That seems not the correct free order. > > Freeing the vdevice should automatically unbind it.. >
One challenge I ran into during implementation was the dependency of vfio on iommufd_device. When vfio needs to perform a tsm_unbind, it only has access to an iommufd_device. However, TSM operations like binding and unbinding are handled at the iommufd_vdevice level. The issue? There’s no direct link from iommufd_device back to iommufd_vdevice. To address this, I modified the following structures: modified drivers/iommu/iommufd/iommufd_private.h @@ -428,6 +428,7 @@ struct iommufd_device { /* protect iopf_enabled counter */ struct mutex iopf_lock; unsigned int iopf_enabled; + struct iommufd_vdevice *vdev; }; static inline struct iommufd_device * @@ -613,6 +614,7 @@ struct iommufd_vdevice { struct iommufd_object obj; struct iommufd_ctx *ictx; struct iommufd_viommu *viommu; + struct iommufd_device *idev; struct device *dev; struct mutex mutex; /* mutex to synchronize updates to tsm_bound */ u64 id; /* per-vIOMMU virtual ID */ These fields are updated during tsm_bind and tsm_unbind, so they must be protected by the appropriate locks: Updating vdevice->idev requires holding vdev->mutex (vdev_lock). Updating device->vdev requires idev->igroup->lock (idev_lock). tsm_unbind in vdevice_destroy: vdevice_destroy() ends up calling tsm_unbind() while holding only the vdev_lock. At first glance, this seems unsafe. But in practice, it's fine because the corresponding iommufd_device has already been destroyed when the VFIO device file descriptor was closed—triggering vfio_df_iommufd_unbind(). I’ve added an in-code comment to explain why tsm_unbind() is safe here without acquiring the idev_lock. Hope that is ok. -aneesh