On Wed, Apr 23, 2025 at 08:55:51AM -0300, Jason Gunthorpe wrote: > On Wed, Apr 23, 2025 at 08:05:49AM +0000, Tian, Kevin wrote: > > > It's not a good idea having the kernel trust the VMM. > > It certainly shouldn't trust it, but it can validate the VMM's choice > and generate a failure if it isn't good. > > > Also I'm not > > sure the contiguity is guaranteed all the time with huge page > > (e.g. if just using THP). > > If things are aligned then the contiguity will work out. Ie a 64K > aligned allocation on a 2M GPA is fine. I don't think there are > edge cases where a GPA will be fragmented. It does rely on the VMM > always getting some kind of huge page and then pinning it in iommufd.
With QEMU that does ensure the alignment when using system huge pages, I haven't seen any edge problem yet. > IMHO this is bad HW design, but it is what it is.. > > > btw does smmu only read the cmdq or also update some fields > > in the queue? If the latter, then it also brings a security hole > > as a malicious VMM could violate the contiguity requirement > > to instruct the smmu to touch pages which don't belong to > > it... > > This really must be prevented. I haven't looked closely here, but the > GPA -> PA mapping should go through the IOAS and that should generate > a page list and that should be validated for contiguity. > > It also needs to act like a mdev and lock down the part of the IOAS > that provides that memory so the pin can't be released and UAF things. If I capture this correctly, the GPA->PA mapping is already done at the IOAS level for the S2 HWPT/domain, i.e. pages are already pinned. So we just need to a pair of for-driver APIs to validate the contiguity and refcount pages calling iopt_area_add_access(). Thanks Nicolin