On Thu, Apr 01, 2021 at 04:38:44AM +0000, Liu, Yi L wrote: > > From: Jason Gunthorpe <j...@nvidia.com> > > Sent: Wednesday, March 31, 2021 8:41 PM > > > > On Wed, Mar 31, 2021 at 07:38:36AM +0000, Liu, Yi L wrote: > > > > > The reason is /dev/ioasid FD is per-VM since the ioasid allocated to > > > the VM should be able to be shared by all assigned device for the VM. > > > But the SVA operations (bind/unbind page table, cache_invalidate) should > > > be per-device. > > > > It is not *per-device* it is *per-ioasid* > > > > And as /dev/ioasid is an interface for controlling multiple ioasid's > > there is no issue to also multiplex the page table manipulation for > > multiple ioasids as well. > > > > What you should do next is sketch out in some RFC the exactl ioctls > > each FD would have and show how the parts I outlined would work and > > point out any remaining gaps. > > > > The device FD is something like the vfio_device FD from VFIO, it has > > *nothing* to do with PASID beyond having a single ioctl to authorize > > the device to use the PASID. All control of the PASID is in > > /dev/ioasid. > > good to see this reply. Your idea is much clearer to me now. If I'm getting > you correctly. I think the skeleton is something like below: > > 1) userspace opens a /dev/ioasid, meanwhile there will be an ioasid > allocated and a per-ioasid context which can be used to do bind page > table and cache invalidate, an ioasid FD returned to userspace. > 2) userspace passes the ioasid FD to VFIO, let it associated with a device > FD (like vfio_device FD). > 3) userspace binds page table on the ioasid FD with the page table info. > 4) userspace unbinds the page table on the ioasid FD > 5) userspace de-associates the ioasid FD and device FD > > Does above suit your outline?
Seems so > If yes, I still have below concern and wish to see your opinion. > - the ioasid FD and device association will happen at runtime instead of > just happen in the setup phase. Of course, this is required for security. The vIOMMU must perform the device association when the guest requires it. Otherwise a guest cannot isolate a PASID to a single process/device pair. I'm worried Intel views the only use of PASID in a guest is with ENQCMD, but that is not consistent with the industry. We need to see normal nested PASID support with assigned PCI VFs. > - how about AMD and ARM's vSVA support? Their PASID allocation and page table > happens within guest. They only need to bind the guest PASID table to host. > Above model seems unable to fit them. (Jean, Eric, Jacob please feel free > to correct me) No, everything needs the device association step or it is not secure. You can give a PASID to a guest and allow it to manipulate it's memory map directly, nested under the guest's CPU page tables. However the guest cannot authorize a PCI BDF to utilize that PASID without going through some kind of step in the hypervisor. A Guest should not be able to authorize a PASID for a BDF it doesn't have access to - only the hypervisor can enforce this. This all must also fit into the mdev model where only the device-specific mdev driver can do the device specific PASID authorization. A hypercall is essential, or we need to stop pretending mdev is a good idea. I'm sure there will be some small differences, and you should clearly explain the entire uAPI surface so that soneone from AMD and ARM can review it. > - this per-ioasid SVA operations is not aligned with the native SVA usage > model. Native SVA bind is per-device. Seems like that is an error in native SVA. SVA is a particular mode of the PASID's memory mapping table, it has nothing to do with a device. Jason _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu