On Wed, Jan 29, 2025 at 10:58:00AM -0400, Jason Gunthorpe wrote: > On Wed, Jan 29, 2025 at 02:44:12PM +0100, Eric Auger wrote: > > On 1/11/25 4:32 AM, Nicolin Chen wrote: > > > For systems that require MSI pages to be mapped into the IOMMU translation > > > the IOMMU driver provides an IOMMU_RESV_SW_MSI range, which is the default > > > recommended IOVA window to place these mappings. However, there is nothing > > > special about this address. And to support the RMR trick in VMM for nested > > well at least it shall not overlap VMM's RAM. So it was not random either. > > > translation, the VMM needs to know what sw_msi window the kernel is using. > > > As there is no particular reason to force VMM to adopt the kernel default, > > > provide a simple IOMMU_OPTION_SW_MSI_START/SIZE ioctl that the VMM can use > > > to directly specify the sw_msi window that it wants to use, which replaces > > > and disables the default IOMMU_RESV_SW_MSI from the driver to avoid having > > > to build an API to discover the default IOMMU_RESV_SW_MSI. > > IIUC the MSI window will then be different when using legacy VFIO > > assignment and iommufd backend. > > ? They use the same, iommufd can have userspace override it. Then it > will ignore the reserved region. > > > MSI reserved regions are exposed in > > /sys/kernel/iommu_groups/<n>/reserved_regions > > 0x0000000008000000 0x00000000080fffff msi > > > Is that configurability reflected accordingly? > > ? > > Nothing using iommufd should parse that sysfs file. > > > How do you make sure it does not collide with other resv regions? I > > don't see any check here. > > Yes this does need to be checked, it does look missing. It still needs > to create a reserved region in the ioas when attaching to keep the > areas safe and it has to intersect with the incoming reserved > regions from the driver.
Yea, I found iopt_reserve_iova() is actually missed entirely... While fixing this, I see a way to turn the OPTIONs back to per- idev, if you still prefer them to be per-idev(?). Then, we can check a given input in the set_option() against the device's reserved region list from the driver, prior to device attaching to any HWPT. Otherwise, we just rely on iopt_enforce_device_reserve_region() during an attach, keeping the option global to simplify VMMs. Thanks Nicolin