On Thu, Apr 01, 2021 at 07:04:01AM +0000, Liu, Yi L wrote: > > - how about AMD and ARM's vSVA support? Their PASID allocation and page > > table > > happens within guest. They only need to bind the guest PASID table to > > host.
In this case each VM has its own IOASID space, and the host IOASID allocator doesn't participate. Plus this only makes sense when assigning a whole VF to a guest, and VFIO is the tool for this. So I wouldn't shoehorn those ops into /dev/ioasid, though we do need a transport for invalidate commands. > > Above model seems unable to fit them. (Jean, Eric, Jacob please feel free > > to correct me) > > - this per-ioasid SVA operations is not aligned with the native SVA usage > > model. Native SVA bind is per-device. Bare-metal SVA doesn't need /dev/ioasid either. A program uses a device handle to either ask whether SVA is enabled, or to enable it explicitly. With or without /dev/ioasid, that step is required. OpenCL uses the first method - automatically enable "fine-grain system SVM" if available, and provide a flag to userspace. So userspace does not need to know about PASID. It's only one method for doing SVA (some GPUs are context-switching page tables instead). > After reading your reply in > https://lore.kernel.org/linux-iommu/20210331123801.gd1463...@nvidia.com/#t > So you mean /dev/ioasid FD is per-VM instead of per-ioasid, so above skeleton > doesn't suit your idea. I draft below skeleton to see if our mind is the > same. But I still believe there is an open on how to fit ARM and AMD's > vSVA support in this the per-ioasid SVA operation model. thoughts? > > +-----------------------------+-----------------------------------------------+ > | userspace | kernel space > | > +-----------------------------+-----------------------------------------------+ > | ioasid_fd = | /dev/ioasid does below: > | > | open("/dev/ioasid", O_RDWR);| struct ioasid_fd_ctx { > | > | | struct list_head ioasid_list; > | > | | ... > | > | | } ifd_ctx; // ifd_ctx is per ioasid_fd > | > +-----------------------------+-----------------------------------------------+ > | ioctl(ioasid_fd, | /dev/ioasid does below: > | > | ALLOC, &ioasid); | struct ioasid_data { > | > | | ioasid_t ioasid; > | > | | struct list_head device_list; > | > | | struct list_head next; > | > | | ... > | > | | } id_data; // id_data is per ioasid > | > | | > | > | | list_add(&id_data.next, > | > | | &ifd_ctx.ioasid_list); > | > +-----------------------------+-----------------------------------------------+ > | ioctl(device_fd, | VFIO does below: > | > | DEVICE_ALLOW_IOASID, | 1) get ioasid_fd, check if ioasid_fd is valid > | > | ioasid_fd, | 2) check if ioasid is allocated from > ioasid_fd| > | ioasid); | 3) register device/domain info to /dev/ioasid > | > | | tracked in id_data.device_list > | > | | 4) record the ioasid in VFIO's per-device > | > | | ioasid list for future security check > | > +-----------------------------+-----------------------------------------------+ > | ioctl(ioasid_fd, | /dev/ioasid does below: > | > | BIND_PGTBL, | 1) find ioasid's id_data > | > | pgtbl_data, | 2) loop the id_data.device_list and tell > iommu| > | ioasid); | give ioasid access to the devices > | > +-----------------------------+-----------------------------------------------+ > | ioctl(ioasid_fd, | /dev/ioasid does below: > | > | UNBIND_PGTBL, | 1) find ioasid's id_data > | > | ioasid); | 2) loop the id_data.device_list and tell > iommu| > | | clear ioasid access to the devices > | > +-----------------------------+-----------------------------------------------+ > | ioctl(device_fd, | VFIO does below: > | > | DEVICE_DISALLOW_IOASID,| 1) check if ioasid is associated in VFIO's > | > | ioasid_fd, | device ioasid list. > | > | ioasid); | 2) unregister device/domain info from > | > | | /dev/ioasid, clear in id_data.device_list > | > +-----------------------------+-----------------------------------------------+ > | ioctl(ioasid_fd, | /dev/ioasid does below: > | > | FREE, ioasid); | list_del(&id_data.next); > | > +-----------------------------+-----------------------------------------------+ Also wondering about: * Querying IOMMU nesting capabilities before binding page tables (which page table formats are supported?). We were planning to have a VFIO cap, but I'm guessing we need to go back to the sysfs solution? * Invalidation, probably an ioasid_fd ioctl? * Page faults, page response. From and to devices, and don't necessarily have a PASID. But needed by vdpa as well, so that's also going through /dev/ioasid? Thanks, Jean