RE: [RFC PATCH 00/30] Add PCIe SVM support to ARM SMMUv3
> -Original Message- > From: iommu-boun...@lists.linux-foundation.org [mailto:iommu- > boun...@lists.linux-foundation.org] On Behalf Of Jean-Philippe Brucker > Sent: Tuesday, February 28, 2017 3:54 AM > Cc: Shanker Donthineni ; k...@vger.kernel.org; > Catalin Marinas ; Sinan Kaya > ; Will Deacon ; > iommu@lists.linux-foundation.org; Harv Abdulhamid ; > linux-...@vger.kernel.org; Bjorn Helgaas ; David > Woodhouse ; linux-arm-ker...@lists.infradead.org; Nate > Watterson > Subject: [RFC PATCH 00/30] Add PCIe SVM support to ARM SMMUv3 > > Hi, > > This series adds support for PCI ATS, PRI and PASID extensions to the > SMMUv3 driver. In systems that support it, it is now possible for some > high-end > devices to perform DMA into process address spaces. Page tables are shared > between MMU and SMMU; page faults from devices are recoverable and handled by > the mm subsystem. > > We propose an extension to the IOMMU API that unifies existing SVM > implementations (AMD, Intel and ARM) in patches 22 and 24. Nothing is set in > stone, > the goal is to start discussions and find an intersection between > implementations. > > We also propose a VFIO interface in patches 29 and 30, that allows userspace > device > drivers to make use of SVM. It would also serve as example implementation for > other device drivers. > > Overview of the patches: > > * 1 and 2 prepare the SMMUv3 structures for ATS, > * 3 to 5 enable ATS for devices that support it. > * 6 to 10 prepare the SMMUv3 structures for PASID and PRI. Patch 9, > in particular, provides details on the structure requirements. > * 11 introduces an interface for sharing ASIDs on ARM64, > * 12 to 17 add more infrastructure for sharing page tables, > * 18 and 19 add minor helpers to PCI, > * 20 enables PASID in devices that support it, Jean, supposedly, you will introduce a PASID management mechanism in SMMU v3 driver. Here I have a question about PASID management on ARM. Will there be a system wide PASID table? Or there is equivalent implementation. Thanks, Yi L > * 21 enables PRI and adds device fault handler, > * 22 and 24 draft a possible interface for SVM in the IOMMU API > * 23 and 25-28 finalize support for SVM in SMMUv3 > * 29 and 30 draft a possible interface for SVM in VFIO. > > The series is available on git://linux-arm.org/linux-jpb.git svm/rfc1 Enable > CONFIG_PCI_PASID, CONFIG_PCI_PRI and you should be good to go. > > So far, this has only been tested with a software model of an SMMUv3 and a > PCIe > DMA engine. We don't intend to get this merged until it has been tested on > silicon, > but at least the driver implementation should be mature enough. I might split > next > versions depending on what is ready and what needs more work so we can merge > it > progressively. > > A lot of open questions remain: > > 1. Can we declare that PASID 0 is always invalid? > > 2. For this prototype, I kept the interface simple from an implementation >perspective. At the moment is is "bind this device to that address >space". For consistency with the rest of VFIO and IOMMU, I think "bind >this container to that address space" would be more in line with VFIO, >and "bind that group to that address space" more in line with IOMMU. >VFIO would tell the IOMMU "for all groups in this container, bind to >that address space". >This raises the question of inconsistency between device capabilities. >When adding a device that supports less PASID bits to a group, what do >we do? What if we already allocated a PASID that is out of range for >the new device? > > 3. How do we reconcile the IOMMU fault reporting infrastructure with the >SVM interface? > > 4. SVM is the product of two features: handling device faults, and devices >having multiple address spaces. What about one feature without the >other? >a. If we cannot afford to have a device fault, can we at least share a > pinned address space? Pinning all current memory would be done by > vfio, but there also need to be pinning of all future mappings. > (mlock isn't sufficient, still allows for minor faults.) >b. If the device has a single address space, can we still bind it to a > process? The main issue with unifying DMA and process page tables is > reserved regions on the device side. What do we do if, for instance, > and MSI frame address clashes with a process mapping? Or if a > process mapping exists outside of the device's DMA window? > > Please find more details in the IOMMU API and VFIO patches. > > Thanks, > Jean-Philippe > > Cc: Harv Abdulhamid > Cc: Will Deacon > Cc: Shanker Donthineni > Cc: Bjorn Helgaas > Cc: Sinan Kaya > Cc: Lorenzo Pieralisi > Cc: Catalin Marinas > Cc: Robin Murphy > Cc: Joerg Roedel > Cc: Nate Watterson > Cc: Alex Williamson > Cc: David Woodhouse > > Cc: linux-arm-ker...@lists.infradead.org > Cc: linux-...@vger.kernel.org > Cc: iommu@lists.linux-foundati
RE: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory
Hi Jean, I'm working on virtual SVM, and have some comments on the VFIO channel definition. > -Original Message- > From: iommu-boun...@lists.linux-foundation.org [mailto:iommu- > boun...@lists.linux-foundation.org] On Behalf Of Jean-Philippe Brucker > Sent: Tuesday, February 28, 2017 3:55 AM > Cc: Shanker Donthineni ; k...@vger.kernel.org; > Catalin Marinas ; Sinan Kaya > ; Will Deacon ; > iommu@lists.linux-foundation.org; Harv Abdulhamid ; > linux-...@vger.kernel.org; Bjorn Helgaas ; David > Woodhouse ; linux-arm-ker...@lists.infradead.org; Nate > Watterson > Subject: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory > > Add two new ioctl for VFIO devices. VFIO_DEVICE_BIND_TASK creates a bond > between a device and a process address space, identified by a device-specific > ID > named PASID. This allows the device to target DMA transactions at the process > virtual addresses without a need for mapping and unmapping buffers explicitly > in the > IOMMU. The process page tables are shared with the IOMMU, and mechanisms such > as PCI ATS/PRI may be used to handle faults. VFIO_DEVICE_UNBIND_TASK removed > a bond identified by a PASID. > > Also add a capability flag in device info to detect whether the system and > the device > support SVM. > > Users need to specify the state of a PASID when unbinding, with flags > VFIO_PASID_RELEASE_FLUSHED and VFIO_PASID_RELEASE_CLEAN. Even for PCI, > PASID invalidation is specific to each device and only partially covered by > the > specification: > > * Device must have an implementation-defined mechanism for stopping the > use of a PASID. When this mechanism finishes, the device has stopped > issuing transactions for this PASID and all transactions for this PASID > have been flushed to the IOMMU. > > * Device may either wait for all outstanding PRI requests for this PASID > to finish, or issue a Stop Marker message, a barrier that separates PRI > requests affecting this instance of the PASID from PRI requests > affecting the next instance. In the first case, we say that the PASID is > "clean", in the second case it is "flushed" (and the IOMMU has to wait > for the Stop Marker before reassigning the PASID.) > > We expect similar distinctions for platform devices. Ideally there should be > a callback > for each PCI device, allowing the IOMMU to ask the device to stop using a > PASID. > When the callback returns, the PASID is either flushed or clean and the > return value > tells which. > > For the moment I don't know how to implement this callback for PCI, so if the > user > forgets to call unbind with either "clean" or "flushed", the PASID is never > reused. For > platform devices, it might be simpler to implement since we could associate an > invalidate_pasid callback to a DT compatible string, as is currently done for > reset. > > Signed-off-by: Jean-Philippe Brucker [...] > drivers/vfio/pci/vfio_pci.c | 24 ++ > drivers/vfio/vfio.c | 104 > > include/uapi/linux/vfio.h | 55 +++ > 3 files changed, 183 insertions(+) > ... > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index > 519eff362c1c..3fe4197a5ea0 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -198,6 +198,7 @@ struct vfio_device_info { > #define VFIO_DEVICE_FLAGS_PCI(1 << 1)/* vfio-pci device */ > #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2) /* vfio-platform device */ > #define VFIO_DEVICE_FLAGS_AMBA (1 << 3) /* vfio-amba device */ > +#define VFIO_DEVICE_FLAGS_SVM(1 << 4)/* Device supports > bind/unbind */ > __u32 num_regions;/* Max region index + 1 */ > __u32 num_irqs; /* Max IRQ index + 1 */ > }; > @@ -409,6 +410,60 @@ struct vfio_irq_set { > */ > #define VFIO_DEVICE_RESET_IO(VFIO_TYPE, VFIO_BASE + 11) > > +struct vfio_device_svm { > + __u32 argsz; > + __u32 flags; > +#define VFIO_SVM_PASID_RELEASE_FLUSHED (1 << 0) > +#define VFIO_SVM_PASID_RELEASE_CLEAN (1 << 1) > + __u32 pasid; > +}; For virtual SVM work, the VFIO channel would be used to passdown guest PASID tale PTR and invalidation information. And may have further usage except the above. Here is the virtual SVM design doc which illustrates the VFIO usage. https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html For the guest PASID table ptr passdown, I've following message in pseudo code. struct pasid_table_info { __u64 ptr; __u32 size; }; For invalidation, I've following info in in pseudo code. struct iommu_svm_tlb_invalidate_info { __u32 inv_type; #define IOTLB_INV (1 << 0) #define EXTENDED_IOTLB_INV (1 << 1) #define DEVICE_IOTLB_INV(1 << 2) #define EXTENDED_DEVICE_IOTLB_INV (1 << 3) #define PASID_CACHE_INV (1 << 4) __u32 pasid; __u64 addr
RE: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory
Hi Jean, Thx for the excellent ideas. Pls refer to comments inline. [...] > > Hi Jean, > > > > I'm working on virtual SVM, and have some comments on the VFIO channel > > definition. > > Thanks a lot for the comments, this is quite interesting to me. I just have > some > concerns about portability so I'm proposing a way to be slightly more generic > below. > yes, portability is what need to consider. [...] > >> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > >> index > >> 519eff362c1c..3fe4197a5ea0 100644 > >> --- a/include/uapi/linux/vfio.h > >> +++ b/include/uapi/linux/vfio.h > >> @@ -198,6 +198,7 @@ struct vfio_device_info { > >> #define VFIO_DEVICE_FLAGS_PCI (1 << 1)/* vfio-pci device */ > >> #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2) /* vfio-platform device > >> */ > >> #define VFIO_DEVICE_FLAGS_AMBA (1 << 3) /* vfio-amba device */ > >> +#define VFIO_DEVICE_FLAGS_SVM (1 << 4)/* Device supports > >> bind/unbind */ > >>__u32 num_regions;/* Max region index + 1 */ > >>__u32 num_irqs; /* Max IRQ index + 1 */ > >> }; > >> @@ -409,6 +410,60 @@ struct vfio_irq_set { > >> */ > >> #define VFIO_DEVICE_RESET _IO(VFIO_TYPE, VFIO_BASE + 11) > >> > >> +struct vfio_device_svm { > >> + __u32 argsz; > >> + __u32 flags; > >> +#define VFIO_SVM_PASID_RELEASE_FLUSHED(1 << 0) > >> +#define VFIO_SVM_PASID_RELEASE_CLEAN (1 << 1) > >> + __u32 pasid; > >> +}; > > > > For virtual SVM work, the VFIO channel would be used to passdown guest > > PASID tale PTR and invalidation information. And may have further > > usage except the above. > > > > Here is the virtual SVM design doc which illustrates the VFIO usage. > > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html > > > > For the guest PASID table ptr passdown, I've following message in pseudo > > code. > > struct pasid_table_info { > > __u64 ptr; > > __u32 size; > > }; > > There should probably be a way to specify the table format, so that the pIOMMU > driver can check that it recognizes the format used by the vIOMMU before > attaching > it. This would allow to reuse the structure for other IOMMU architectures. > If, for > instance, the host has an intel IOMMU and someone decides to emulate an ARM > SMMU with Qemu (their loss :), it can certainly use VFIO for passing-through > devices > with MAP/UNMAP. But if Qemu then attempts to passdown a PASID table in SMMU > format, the Intel driver should have a way to reject it, as the SMMU format > isn't > compatible. Exactly, it would be grt if we can have the API defined as generic as MAP/UNMAP. The case you mentioned to emulate an ARM SMMU on an Intel platform is representative. For such cases, the problem is different vendors may have different PASID table format and also different page table format. In my understanding, these incompatible things may just result in failure if users try such emulation. What's your opinion here? Anyhow, better to listen to different voices. > > I'm tackling a similar problem at the moment, but for passing a single page > directory > instead of full PASID table to the IOMMU. For, Intel IOMMU, passing the whole guest PASID table is enough and it also avoids too much pgd passing. However, I'm open on this idea. You may just add a new flag in "struct vfio_device_svm" and pass the single pgd down to host. > > So we need some kind of high-level classification that the vIOMMU must > communicate to the physical one. Each IOMMU flavor would get a unique, global > identifier, simply to make sure that vIOMMU and pIOMMU speak the same > language. > For example: > > 0x65776886 "AMDV" AMD IOMMU > 0x73788476 "INTL" Intel IOMMU > 0x83515748 "S390" s390 IOMMU > 0x8385 "SMMU" ARM SMMU > etc. > > It needs to be a global magic number that everyone can recognize. Could be as > simple as 32-bit numbers allocated from 0. Once we have a global magic > number, we > can use it to differentiate architecture-specific details. I may need to think more on this part. > struct pasid_table_info { > __u64 ptr; > __u64 size; /* Is it number of entry or size in > bytes? */ For Intel platform, it's encoded. But I can make it in bytes. Here, I'd like to check with you if whole guest PASID info is also needed on ARM? > > __u32 model;/* magic number */ > __u32 variant; /* version of the IOMMU architecture, > maybe? IOMMU-specific. */ > __u8 opaque[]; /* IOMMU-specific details */ > }; > > And then each IOMMU or page-table code can do low-level validation of the > format, > by reading the details in 'opaque'. I assume that for Intel this would be > empty. But yes, for Intel, if the PASID ptr is in the definition, opaque would be empty. > for instance on ARM SMMUv3, PASID table can have either one or two levels, and >
RE: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory
> -Original Message- > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf > Of Jean-Philippe Brucker > Sent: Thursday, March 23, 2017 9:38 PM > To: Liu, Yi L ; Alex Williamson > > Cc: Shanker Donthineni ; k...@vger.kernel.org; > Catalin Marinas ; Sinan Kaya > ; Will Deacon ; > iommu@lists.linux-foundation.org; Harv Abdulhamid ; > linux-...@vger.kernel.org; Bjorn Helgaas ; David > Woodhouse ; linux-arm-ker...@lists.infradead.org; Nate > Watterson ; Tian, Kevin ; > Lan, Tianyu ; Raj, Ashok ; Pan, > Jacob > jun ; Joerg Roedel ; Robin Murphy > > Subject: Re: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory > > On 23/03/17 08:39, Liu, Yi L wrote: > > Hi Jean, > > > > Thx for the excellent ideas. Pls refer to comments inline. > > > > [...] > > > >>> Hi Jean, > >>> > >>> I'm working on virtual SVM, and have some comments on the VFIO > >>> channel definition. > >> > >> Thanks a lot for the comments, this is quite interesting to me. I > >> just have some concerns about portability so I'm proposing a way to be > >> slightly > more generic below. > >> > > > > yes, portability is what need to consider. > > > > [...] > > > >>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > >>>> index > >>>> 519eff362c1c..3fe4197a5ea0 100644 > >>>> --- a/include/uapi/linux/vfio.h > >>>> +++ b/include/uapi/linux/vfio.h > >>>> @@ -198,6 +198,7 @@ struct vfio_device_info { > >>>> #define VFIO_DEVICE_FLAGS_PCI (1 << 1)/* vfio-pci device */ > >>>> #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2) /* vfio-platform device > >>>> */ > >>>> #define VFIO_DEVICE_FLAGS_AMBA (1 << 3)/* vfio-amba device */ > >>>> +#define VFIO_DEVICE_FLAGS_SVM (1 << 4)/* Device supports > bind/unbind */ > >>>> __u32 num_regions;/* Max region index + 1 */ > >>>> __u32 num_irqs; /* Max IRQ index + 1 */ > >>>> }; > >>>> @@ -409,6 +410,60 @@ struct vfio_irq_set { > >>>> */ > >>>> #define VFIO_DEVICE_RESET _IO(VFIO_TYPE, VFIO_BASE + 11) > >>>> > >>>> +struct vfio_device_svm { > >>>> +__u32 argsz; > >>>> +__u32 flags; > >>>> +#define VFIO_SVM_PASID_RELEASE_FLUSHED (1 << 0) > >>>> +#define VFIO_SVM_PASID_RELEASE_CLEAN(1 << 1) > >>>> +__u32 pasid; > >>>> +}; > >>> > >>> For virtual SVM work, the VFIO channel would be used to passdown > >>> guest PASID tale PTR and invalidation information. And may have > >>> further usage except the above. > >>> > >>> Here is the virtual SVM design doc which illustrates the VFIO usage. > >>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html > >>> > >>> For the guest PASID table ptr passdown, I've following message in pseudo > >>> code. > >>> struct pasid_table_info { > >>> __u64 ptr; > >>> __u32 size; > >>> }; > >> > >> There should probably be a way to specify the table format, so that > >> the pIOMMU driver can check that it recognizes the format used by the > >> vIOMMU before attaching it. This would allow to reuse the structure > >> for other IOMMU architectures. If, for instance, the host has an > >> intel IOMMU and someone decides to emulate an ARM SMMU with Qemu > >> (their loss :), it can certainly use VFIO for passing-through devices > >> with MAP/UNMAP. But if Qemu then attempts to passdown a PASID table > >> in SMMU format, the Intel driver should have a way to reject it, as the > >> SMMU > format isn't compatible. > > > > Exactly, it would be grt if we can have the API defined as generic as > > MAP/UNMAP. The case you mentioned to emulate an ARM SMMU on an Intel > platform is representative. > > For such cases, the problem is different vendors may have different > > PASID table format and also different page table format. In my > > understanding, these incompatible things may just result in failure if > > users try such > emulation. What's your opinion here? > > Anyhow, better to listen to different voices. > > Yes, in case the vIOMMU and
RE: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory
> -Original Message- > From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Monday, March 27, 2017 6:14 PM > To: Liu, Yi L ; Alex Williamson > > Cc: Shanker Donthineni ; k...@vger.kernel.org; > Catalin Marinas ; Sinan Kaya > ; Will Deacon ; > iommu@lists.linux-foundation.org; Harv Abdulhamid ; > linux-...@vger.kernel.org; Bjorn Helgaas ; David > Woodhouse ; linux-arm-ker...@lists.infradead.org; Nate > Watterson ; Tian, Kevin ; > Lan, Tianyu ; Raj, Ashok ; Pan, > Jacob > jun ; Joerg Roedel ; Robin Murphy > > Subject: Re: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory > > On 24/03/17 07:46, Liu, Yi L wrote: > [...] > >>>> > >>>> So we need some kind of high-level classification that the vIOMMU > >>>> must communicate to the physical one. Each IOMMU flavor would get a > >>>> unique, global identifier, simply to make sure that vIOMMU and > >>>> pIOMMU speak > >> the same language. > >>>> For example: > >>>> > >>>> 0x65776886 "AMDV" AMD IOMMU > >>>> 0x73788476 "INTL" Intel IOMMU > >>>> 0x83515748 "S390" s390 IOMMU > >>>> 0x8385 "SMMU" ARM SMMU > >>>> etc. > >>>> > >>>> It needs to be a global magic number that everyone can recognize. > >>>> Could be as simple as 32-bit numbers allocated from 0. Once we have > >>>> a global magic number, we can use it to differentiate > >>>> architecture-specific > details. > > > > I prefer simple numbers to stand for each vendor. > > Sure, I don't have any preference. Simple numbers could be easier to allocate. > > >>> I may need to think more on this part. > >>> > >>>> struct pasid_table_info { > >>>> __u64 ptr; > >>>> __u64 size; /* Is it number of entry or size in > >>>> bytes? */ > >>> > >>> For Intel platform, it's encoded. But I can make it in bytes. Here, > >>> I'd like to check with you if whole guest PASID info is also needed on > >>> ARM? > >> > >> It will be needed on ARM if someone ever emulates the SMMU with SVM. > >> Though I'm not planning on doing that myself, it is unavoidable. And > >> it would be a shame for the next SVM virtualization solution to have > >> to introduce a new flag "VFIO_SVM_BIND_PASIDPT_2" if they could reuse > >> most of the BIND_PASIDPT interface but simply needed to add one or > >> two configuration fields specific to their IOMMU. > > > > So you are totally fine with putting PASID table ptr and size in the > > generic part? Maybe we have different usage for it. For me, it's a > > guest PASID table ptr. For you, it may be different. > > It's the same for SMMU, with some added format specifiers that would go in > 'opaque[]'. I think that table pointer and size (in bytes, or number of > entries) is generic enough for a "bind table" call and can be reused by future > implementations. > > >>>> > >>>> __u32 model;/* magic number */ > >>>> __u32 variant; /* version of the IOMMU architecture, > >>>> maybe? IOMMU-specific. */ > > > > For variant, it will be combined with model to do sanity check. Am I right? > > Maybe it could be moved to opaque. > > Yes I guess it could be moved to opaque. It would be a version of the model > used, so > we wouldn't have to allocate a new model number whenever an architecture > updates the fields of its PASID descriptors, but we can let IOMMU drivers > decide if > they need it and what to put in there. > > >>>> __u8 opaque[]; /* IOMMU-specific details */ > >>>> }; > >>>> > [...] > >> > >> Yes, that seems sensible. I could add an explicit VFIO_BIND_PASID > >> flags to make it explicit that data[] is "u32 pasid" and avoid having any > >> default. > > > > Add it in the comment I suppose. The length is 4 byes, it could be deduced > > from > argsz. > > > >> > >>>> > >>>>> #define VFIO_SVM_PASSDOWN_INVALIDATE(1 << 1) > >>>> > >>>> Using the vfio_device_svm structure for invalidate operations is a > >>>> bit odd, it might be nicer to add a new VFIO_SVM_INVALIDATE
[RFC PATCH 00/20] Qemu: Extend intel_iommu emulator to support Shared Virtual Memory
entry Run-time: (4) Forward guest cache invalidation requests for 1st level translation to pIOMMU (5) Fault reporting, reports fault happen on host to intel_iommu emulator, then to guest (6) Page Request and response As fault reporting framework is in discussion in another thread which is driven by Lan Tianyu, so vSVM enabling plan is to divide the work into two phase. This patchset is for Phase 1. Phase 1: include item (1), (2) and (3). Phase 2: include item (4), (5) and (6). [Overview of patch] This patchset has a requirement of Passthru-Mode supporting for intel_iommu. Peter Xu has sent a patch for it. https://www.mail-archive.com/qemu-devel@nongnu.org/msg443627.html * 1 ~ 2 enables Extend-Context Support in intel_iommu emulator. * 3 exposes SVM related capability to guest with an option. * 4 changes VFIO notifier parameter for the newly added notifier. * 5 ~ 6 adds new VFIO notifier for pasid table bind request. * 7 ~ 8 adds notifier flag check in memory_replay and region_del. * 9 ~ 11 introduces a mechanism between VFIO and intel_iommu emulator to record assigned device info. e.g. the host SID of the assigned device. * 12 adds fire function for pasid table bind notifier * 13 adds generic definition for pasid table info in iommu.h * 14 ~ 15 link the guest pasid table to host for intel_iommu * 16 adds VFIO notifier for propagating guest IOMMU TLB invalidate to host. * 17 adds fire function for IOMMU TLB invalidate notifier * 18 ~ 20 propagate first-level page table related cache invalidate to host. [Test Done] The patchset is tested with IGD. Assign IGD to guest, the IGD could write data to guest application address space. i915 SVM capable driver could be found: https://cgit.freedesktop.org/~miku/drm-intel/?h=svm i915 svm test tool: https://cgit.freedesktop.org/~miku/intel-gpu-tools/log/?h=svm [Co-work with gIOVA enablement] Currently Peter Xu is working on enabling gIOVA usage for Intel IOMMU emulator, this patchset is based on Peter's work (V7). https://github.com/xzpeter/qemu/tree/vtd-vfio-enablement-v7 [Limitation] * Due to VT-d HW limitation, an assigned device cannot use gIOVA and vSVM in the same time. Intel VT-d spec would introduce a new capability bit indicating such limitation which guest IOMMU driver can check to prevent both IOVA/SVM enabled, as a short-term solution. In the long term it will be fixed by HW. [Open] * This patchset proposes passing raw data from guest to host when propagating the guest IOMMU TLB invalidation. In fact, we have two choice here. a) as proposed in this patchset, passing raw data to host. Host pIOMMU driver submits invalidation request after replacing specific fields. Reject if the IOMMU model is not correct. * Pros: no need to do parse and re-assembling, better performance * Cons: unable to support the scenarios which emulates an Intel IOMMU on an ARM platform. b) parse the invalidation info into specific data, e.g. gran, addr, size, invalidation type etc. then fill the data in a generic structure. In host, pIOMMU driver re-assemble the invalidation request and submit to pIOMMU. * Pros: may be able to support the scenario above. But it is still in question since different vendor may have vendor specific invalidation info. This would make it difficult to have vendor agnostic invalidation propagation API. * Cons: needs additional complexity to do parse and re-assembling. The generic structure would be a hyper-set of all possible invalidate info, this may be hard to maintain in future. As the pros/cons show, I proposed a) as an initial version. But it is an open. I would be glad to hear from you. FYI. The following definition is a draft discussed with Jean in previous discussion. It has both generic part and vendor specific part. struct tlb_invalidate_info { __u32 model; /* Vendor number */ __u8 granularity #define DEVICE_SELECTVIE_INV(1 << 0) #define PAGE_SELECTIVE_INV (1 << 0) #define PASID_SELECTIVE_INV (1 << 1) __u32 pasid; __u64 addr; __u64 size; /* Since IOMMU format has already been validated for this table, the IOMMU driver knows that the following structure is in a format it knows */ __u8 opaque[]; }; struct tlb_invalidate_info_intel { __u32 inv_type; ... __u64 flags; ... __u8 mip; __u16 pfsid; }; Additionally, Jean is proposing a para-vIOMMU solution. There is opaque data in the proposed invalidate request VIRTIO_IOMMU_T_INVALIDATE. So it may be preferred to have opaque part when doing the iommu tlb invalidate propagation in SVM virtualization. http://www.spinics.net/lists/kvm/msg147993.html Best Wishes, Yi L Liu, Yi L (20): intel_iommu: add "ecs" option intel_iommu: exposed extended-context mode to guest intel_iommu: add "svm" option
[RFC PATCH 05/20] VFIO: add new IOCTL for svm bind tasks
Add a new IOCTL cmd VFIO_IOMMU_SVM_BIND_TASK attached on container->fd. On VT-d, this IOCTL cmd would be used to link the guest PASID page table to host. While for other vendors, it may also be used to support other kind of SVM bind request. Previously, there is a discussion on it with ARM engineer. It can be found by the link below. This IOCTL cmd may support SVM PASID bind request from userspace driver, or page table(cr3) bind request from guest. These SVM bind requests would be supported by adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to support page table bind from guest. https://patchwork.kernel.org/patch/9594231/ Signed-off-by: Liu, Yi L --- linux-headers/linux/vfio.h | 18 ++ 1 file changed, 18 insertions(+) diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index 759b850..9848d63 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -537,6 +537,24 @@ struct vfio_iommu_type1_dma_unmap { #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15) #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) +/* IOCTL for Shared Virtual Memory Bind */ +struct vfio_device_svm { + __u32 argsz; +#define VFIO_SVM_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */ +#define VFIO_SVM_BIND_PASID(1 << 1) /* Bind PASID from userspace driver */ +#define VFIO_SVM_BIND_PGTABLE (1 << 2) /* Bind guest mmu page table */ + __u32 flags; + __u32 length; + __u8data[]; +}; + +#define VFIO_SVM_TYPE_MASK (VFIO_SVM_BIND_PASIDTBL | \ + VFIO_SVM_BIND_PASID | \ + VFIO_SVM_BIND_PGTABLE ) + +#define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) + + /* Additional API for SPAPR TCE (Server POWERPC) IOMMU */ /* -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 13/20] IOMMU: add pasid_table_info for guest pasid table
This patch adds iommu.h to define some generic definition for IOMMU. Here defines "struct pasid_table_info" for guest pasid table bind. Signed-off-by: Liu, Yi L --- linux-headers/linux/iommu.h | 30 ++ 1 file changed, 30 insertions(+) create mode 100644 linux-headers/linux/iommu.h diff --git a/linux-headers/linux/iommu.h b/linux-headers/linux/iommu.h new file mode 100644 index 000..4519dcf --- /dev/null +++ b/linux-headers/linux/iommu.h @@ -0,0 +1,30 @@ +/* + * Copyright (C) 2017 Intel Corporation. + * Author: Yi Liu + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + */ + +#ifndef __LINUX_IOMMU_H +#define __LINUX_IOMMU_H + +#include + +struct pasid_table_info { + __u64 ptr; /* PASID table ptr */ + __u64 size;/* PASID table size*/ + __u32 model; /* magic number */ +#defineINTEL_IOMMU (1 << 0) +#defineARM_SMMU(1 << 1) + __u8 opaque[];/* IOMMU-specific details */ +}; + +#endif /* __LINUX_IOMMU_H */ -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 04/20] Memory: modify parameter in IOMMUNotifier func
This patch modifies parameter of IOMMUNotifier, use "void *data" instead of "IOMMUTLBEntry*". This is to extend it to support notifiers other than MAP/UNMAP. Signed-off-by: Liu, Yi L --- hw/vfio/common.c | 3 ++- hw/virtio/vhost.c | 3 ++- include/exec/memory.h | 2 +- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 6b33b9f..14473f1 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -332,10 +332,11 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr, return true; } -static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) +static void vfio_iommu_map_notify(IOMMUNotifier *n, void *data) { VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n); VFIOContainer *container = giommu->container; +IOMMUTLBEntry *iotlb = (IOMMUTLBEntry *)data; hwaddr iova = iotlb->iova + giommu->iommu_offset; bool read_only; void *vaddr; diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index ccf8b2e..fd20fd0 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -1161,9 +1161,10 @@ static void vhost_virtqueue_cleanup(struct vhost_virtqueue *vq) event_notifier_cleanup(&vq->masked_notifier); } -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) +static void vhost_iommu_unmap_notify(IOMMUNotifier *n, void *data) { struct vhost_dev *hdev = container_of(n, struct vhost_dev, n); +IOMMUTLBEntry *iotlb = (IOMMUTLBEntry *)data; if (hdev->vhost_ops->vhost_invalidate_device_iotlb(hdev, iotlb->iova, diff --git a/include/exec/memory.h b/include/exec/memory.h index 267f399..1faca3b 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -81,7 +81,7 @@ typedef enum { struct IOMMUNotifier; typedef void (*IOMMUNotify)(struct IOMMUNotifier *notifier, -IOMMUTLBEntry *data); +void *data); struct IOMMUNotifier { IOMMUNotify notify; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier
Add a separate function to fire pasid table bind notifier. In future there may be more pasid bind type with different granularity. e.g. binding pasid entry instead of binding pasid table. It can be supported by adding bind_type, check bind_type in fire func and trigger correct notifier. Signed-off-by: Liu, Yi L --- include/exec/memory.h | 11 +++ memory.c | 21 + 2 files changed, 32 insertions(+) diff --git a/include/exec/memory.h b/include/exec/memory.h index 49087ef..3b8f487 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -695,6 +695,17 @@ uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr); void memory_region_notify_iommu(MemoryRegion *mr, IOMMUTLBEntry entry); +/* + * memory_region_notify_iommu_svm_bind notify SVM bind + * request from vIOMMU emulator. + * + * @mr: the memory region of IOMMU + * @data: IOMMU SVM data + */ +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr, + void *data); + + /** * memory_region_notify_one: notify a change in an IOMMU translation * entry to a single notifier diff --git a/memory.c b/memory.c index 45ef069..ce0b0ff 100644 --- a/memory.c +++ b/memory.c @@ -1729,6 +1729,27 @@ void memory_region_notify_iommu(MemoryRegion *mr, } } +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr, + void *data) +{ +IOMMUNotifier *iommu_notifier; +IOMMUNotifierFlag request_flags; + +assert(memory_region_is_iommu(mr)); + +/*TODO: support other bind requests with smaller gran, + * e.g. bind signle pasid entry + */ +request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND; + +QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) { +if (iommu_notifier->notifier_flags & request_flags) { +iommu_notifier->notify(iommu_notifier, data); +break; +} +} +} + void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client) { uint8_t mask = 1 << client; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 10/20] VFIO: notify vIOMMU emulator when device is assigned
With vIOMMU exposed to guest, notify vIOMMU emulator to record information of this assigned device. This patch adds iommu_ops->record_device to record the host bus/slot/function for this device. In future, it can be extended to other info which is needed. Signed-off-by: Liu, Yi L --- hw/vfio/pci.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 9e13472..a1e6942 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2881,6 +2881,10 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) subregion, 0, &n1); + +memory_region_notify_device_record(subregion, + &vdev->host); + } } -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 15/20] intel_iommu: link whole guest pasid table to host
VT-d has a nested mode which allows SVM virtualization. Link the whole guest PASID table to host context entry and enable nested mode, pIOMMU would do nested translation for DMA request. Thus achieve GVA->HPA translation. When extended-context-entry is modified in guest, intel_iommu emulator should capture it, then link the whole guest PASID table to host and enable nested mode for the assigned device. Signed-off-by: Liu, Yi L --- hw/i386/intel_iommu.c | 121 +++-- hw/i386/intel_iommu_internal.h | 11 2 files changed, 127 insertions(+), 5 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index f291995..cd6db65 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -36,6 +36,7 @@ #include "hw/i386/apic_internal.h" #include "kvm_i386.h" #include "trace.h" +#include /*#define DEBUG_INTEL_IOMMU*/ #ifdef DEBUG_INTEL_IOMMU @@ -55,6 +56,14 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR); #define VTD_DPRINTF(what, fmt, ...) do {} while (0) #endif +typedef void (*vtd_device_hook)(VTDNotifierIterator *iter, +void *hook_info, +void *notify_info); + +static void vtd_context_inv_notify_hook(VTDNotifierIterator *iter, +void *hook_info, +void *notify_info); + #define FOR_EACH_ASSIGN_DEVICE(__notify_info_type, \ __opaque_type, \ __hook_info, \ @@ -1213,6 +1222,66 @@ static void vtd_iommu_replay_all(IntelIOMMUState *s) } } +void vtd_context_inv_notify_hook(VTDNotifierIterator *iter, + void *hook_info, + void *notify_info) +{ +struct pasid_table_info *pasidt_info; +IOMMUNotifierData iommu_data; +VTDContextHookInfo *context_hook_info; +uint16_t *host_sid; +pasidt_info = (struct pasid_table_info *) notify_info; +context_hook_info = (VTDContextHookInfo *) hook_info; +switch (context_hook_info->gran) { +case VTD_INV_DESC_CC_GLOBAL: +/* Fall through */ +case VTD_INV_DESC_CC_DOMAIN: +if (iter->did == *context_hook_info->did) { +break; +} +/* Fall through */ +case VTD_INV_DESC_CC_DEVICE: +if ((iter->did == *context_hook_info->did) && +(iter->sid == *context_hook_info->sid)) { +break; +} +/* Fall through */ +default: +return; +} + +pasidt_info->model = INTEL_IOMMU; +host_sid = (uint16_t *)&pasidt_info->opaque; + +pasidt_info->ptr = iter->ce[1].lo; +pasidt_info->size = iter->ce[1].lo & VTD_PASID_TABLE_SIZE_MASK; +*host_sid = iter->host_sid; +iommu_data.payload = (uint8_t *) pasidt_info; +iommu_data.payload_size = sizeof(*pasidt_info) + sizeof(*host_sid); +memory_region_notify_iommu_svm_bind(&iter->vtd_as->iommu, +&iommu_data); +return; +} + +static void vtd_context_cache_invalidate_notify(IntelIOMMUState *s, +uint16_t *did, +uint16_t *sid, +uint8_t gran, +vtd_device_hook hook_fn) +{ +VTDContextHookInfo context_hook_info = { +.did = did, +.sid = sid, +.gran = gran, +}; + +FOR_EACH_ASSIGN_DEVICE(struct pasid_table_info, + uint16_t, + &context_hook_info, + hook_fn); +return; +} + static void vtd_context_global_invalidate(IntelIOMMUState *s) { trace_vtd_inv_desc_cc_global(); @@ -1228,8 +1297,35 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s) * VT-d emulation codes. */ vtd_iommu_replay_all(s); + +if (s->svm) { +vtd_context_cache_invalidate_notify(s, NULL, NULL, +VTD_INV_DESC_CC_GLOBAL, vtd_context_inv_notify_hook); +} } +static void vtd_context_domain_selective_invalidate(IntelIOMMUState *s, +uint16_t did) +{ +trace_vtd_inv_desc_cc_global(); +s->context_cache_gen++; +if (s->context_cache_gen == VTD_CONTEXT_CACHE_GEN_MAX) { +vtd_reset_context_cache(s); +} +/* + * From VT-d spec 6.5.2.1, a global context entry invalidation + * should be followed by a IOTLB global invalidation, so we should + * be safe even without this. Hoewever, let's replay the region as + * well to be safer, and go back here when we need finer tunes for + * VT-d emulation codes. + */ +vtd_iommu_replay_all(s); + +if (s->svm) { +
[RFC PATCH 08/20] Memory: add notifier flag check in memory_replay()
memory_region_iommu_replay is used to do replay with MAP/UNMAP notifier. However, other notifiers may be passed in, so add a check against notifier flag to avoid potential error. e.g. memory_region_iommu_replay_all loops all registered notifiers, may just pass in wrong notifier. Signed-off-by: Liu, Yi L --- memory.c | 8 1 file changed, 8 insertions(+) diff --git a/memory.c b/memory.c index 9c253cc..0728e62 100644 --- a/memory.c +++ b/memory.c @@ -1630,6 +1630,14 @@ void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n, hwaddr addr, granularity; IOMMUTLBEntry iotlb; +if (!(n->notifier_flags & IOMMU_NOTIFIER_MAP_UNMAP)) { +/* If notifier flag is not IOMMU_NOTIFIER_UNMAP or + * IOMMU_NOTIFIER_MAP, return. This check is necessary + * as there is notifier other than MAP/UNMAP + */ +return; +} + /* If the IOMMU has its own replay callback, override */ if (mr->iommu_ops->replay) { mr->iommu_ops->replay(mr, n); -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 17/20] Memory: Add func to fire TLB invalidate notifier
This patch adds a separate function to fire IOMMU TLB invalidate notifier. Signed-off-by: Liu, Yi L --- include/exec/memory.h | 9 + memory.c | 18 ++ 2 files changed, 27 insertions(+) diff --git a/include/exec/memory.h b/include/exec/memory.h index af15351..0155bad 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -707,6 +707,15 @@ void memory_region_notify_iommu(MemoryRegion *mr, void memory_region_notify_iommu_svm_bind(MemoryRegion *mr, void *data); +/* + * memory_region_notify_iommu_invalidate: notify IOMMU + * TLB invalidation passdown. + * + * @mr: the memory region of IOMMU + * @data: IOMMU SVM data + */ +void memory_region_notify_iommu_invalidate(MemoryRegion *mr, + void *data); /** * memory_region_notify_one: notify a change in an IOMMU translation diff --git a/memory.c b/memory.c index ce0b0ff..8c572d5 100644 --- a/memory.c +++ b/memory.c @@ -1750,6 +1750,24 @@ void memory_region_notify_iommu_svm_bind(MemoryRegion *mr, } } +void memory_region_notify_iommu_invalidate(MemoryRegion *mr, + void *data) +{ +IOMMUNotifier *iommu_notifier; +IOMMUNotifierFlag request_flags; + +assert(memory_region_is_iommu(mr)); + +request_flags = IOMMU_NOTIFIER_IOMMU_TLB_INV; + +QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) { +if (iommu_notifier->notifier_flags & request_flags) { +iommu_notifier->notify(iommu_notifier, data); +break; +} +} +} + void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client) { uint8_t mask = 1 << client; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 06/20] VFIO: add new notifier for binding PASID table
This patch includes the following items: * add vfio_register_notifier() for vfio notifier initialization * add new notifier flag IOMMU_NOTIFIER_SVM_PASIDT_BIND = 0x4 * add vfio_iommu_bind_pasid_tbl_notify() to link guest pasid table to host This patch doesn't register new notifier in vfio memory region listener region_add callback. The reason is as below: On VT-d, when virtual intel_iommu is exposed to guest, the vfio memory listener listens to address_space_memory. When guest Intel IOMMU driver enables address translation, vfio memory listener may switch to listen to vtd_address_space. But there is special case. If virtual intel_iommu reports ecap.PT=1 to guest and meanwhile guest Intel IOMMU driver sets "pt" mode for the assigned, vfio memory listener would keep listen to address_space_memory to make sure there is GPA->HPA mapping in pIOMMU. Thus region_add would not be triggered. While for the newly added notifier, it requires to be registered once virtual intel_iommu is exposed to guest. Signed-off-by: Liu, Yi L --- hw/vfio/common.c | 37 +++--- hw/vfio/pci.c | 53 ++- include/exec/memory.h | 8 +++ include/hw/vfio/vfio-common.h | 5 4 files changed, 94 insertions(+), 9 deletions(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 14473f1..e270255 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -294,6 +294,25 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section) section->offset_within_address_space & (1ULL << 63); } +VFIOGuestIOMMU *vfio_register_notifier(VFIOContainer *container, + MemoryRegion *mr, + hwaddr offset, + IOMMUNotifier *n) +{ +VFIOGuestIOMMU *giommu; + +giommu = g_malloc0(sizeof(*giommu)); +giommu->iommu = mr; +giommu->iommu_offset = offset; +giommu->container = container; +giommu->n = *n; + +QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next); +memory_region_register_iommu_notifier(giommu->iommu, &giommu->n); + +return giommu; +} + /* Called with rcu_read_lock held. */ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr, bool *read_only) @@ -466,6 +485,8 @@ static void vfio_listener_region_add(MemoryListener *listener, if (memory_region_is_iommu(section->mr)) { VFIOGuestIOMMU *giommu; +IOMMUNotifier n; +hwaddr iommu_offset; trace_vfio_listener_region_add_iommu(iova, end); /* @@ -474,21 +495,21 @@ static void vfio_listener_region_add(MemoryListener *listener, * would be the right place to wire that up (tell the KVM * device emulation the VFIO iommu handles to use). */ -giommu = g_malloc0(sizeof(*giommu)); -giommu->iommu = section->mr; -giommu->iommu_offset = section->offset_within_address_space - - section->offset_within_region; -giommu->container = container; +iommu_offset = section->offset_within_address_space - + section->offset_within_region; llend = int128_add(int128_make64(section->offset_within_region), section->size); llend = int128_sub(llend, int128_one()); -iommu_notifier_init(&giommu->n, vfio_iommu_map_notify, +iommu_notifier_init(&n, vfio_iommu_map_notify, IOMMU_NOTIFIER_ALL, section->offset_within_region, int128_get64(llend)); -QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next); -memory_region_register_iommu_notifier(giommu->iommu, &giommu->n); +giommu = vfio_register_notifier(container, +section->mr, +iommu_offset, +&n); + memory_region_iommu_replay(giommu->iommu, &giommu->n, false); return; diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 332f41d..9e13472 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2594,11 +2594,38 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev) vdev->req_enabled = false; } +static void vfio_iommu_bind_pasid_tbl_notify(IOMMUNotifier *n, void *data) +{ +VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n); +VFIOContainer *container = giommu->container; +IOMMUNotifierData *iommu_data = (IOMMUNotifierData *) data; +struct vfio_device_svm *vfio_svm; +int argsz; + +argsz = sizeof(*vfio_svm) + iommu_data->payload_size; +vfio_svm = g_malloc0(argsz); +vfio_svm->argsz =
[RFC PATCH 07/20] VFIO: check notifier flag in region_del()
This patch adds flag check when unregistering MAP/UNMAP notifier in region_del. MAP/UNMAP notifier would be unregistered when iommu memory region is deleted. This is to avoid unregistering other notifiers. Peter Xu's intel_iommu enhancement series has introduced dynamic switch of IOMMU region. If an assigned device switches to use "pt", the IOMMU region would be deleted, thus the MAP/UNMAP notifier would be unregistered. While for some cases, the other notifiers may still wanted. e.g. if a user decides to use vSVM for the assigned device after the switch, then the pasid table bind notifier is needed. The newly added pasid table bind notifier would be unregistered in the vfio_disconnect_container(). The link below would direct you to Peter's dynamic switch patch. https://www.mail-archive.com/qemu-devel@nongnu.org/msg62.html Signed-off-by: Liu, Yi L --- hw/vfio/common.c | 5 +++-- include/exec/memory.h | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index e270255..719de61 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -501,7 +501,7 @@ static void vfio_listener_region_add(MemoryListener *listener, section->size); llend = int128_sub(llend, int128_one()); iommu_notifier_init(&n, vfio_iommu_map_notify, -IOMMU_NOTIFIER_ALL, +IOMMU_NOTIFIER_MAP_UNMAP, section->offset_within_region, int128_get64(llend)); @@ -578,7 +578,8 @@ static void vfio_listener_region_del(MemoryListener *listener, QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) { if (giommu->iommu == section->mr && -giommu->n.start == section->offset_within_region) { +giommu->n.start == section->offset_within_region && +giommu->n.notifier_flags & IOMMU_NOTIFIER_MAP_UNMAP) { memory_region_unregister_iommu_notifier(giommu->iommu, &giommu->n); QLIST_REMOVE(giommu, giommu_next); diff --git a/include/exec/memory.h b/include/exec/memory.h index d2f24cc..7bd13ab 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -85,7 +85,7 @@ typedef enum { IOMMU_NOTIFIER_SVM_PASIDT_BIND = 0x4, } IOMMUNotifierFlag; -#define IOMMU_NOTIFIER_ALL (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP) +#define IOMMU_NOTIFIER_MAP_UNMAP (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP) struct IOMMUNotifier; typedef void (*IOMMUNotify)(struct IOMMUNotifier *notifier, -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest
VT-d implementations reporting PASID or PRS fields as "Set", must also report ecap.ECS as "Set". Extended-Context is required for SVM. When ECS is reported, intel iommu driver would initiate extended root entry and extended context entry, and also PASID table if there is any SVM capable device. Signed-off-by: Liu, Yi L --- hw/i386/intel_iommu.c | 131 +++-- hw/i386/intel_iommu_internal.h | 9 +++ include/hw/i386/intel_iommu.h | 2 +- 3 files changed, 97 insertions(+), 45 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 400d0d1..bf98fa5 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -497,6 +497,11 @@ static inline bool vtd_root_entry_present(VTDRootEntry *root) return root->val & VTD_ROOT_ENTRY_P; } +static inline bool vtd_root_entry_upper_present(VTDRootEntry *root) +{ +return root->rsvd & VTD_ROOT_ENTRY_P; +} + static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index, VTDRootEntry *re) { @@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index, return -VTD_FR_ROOT_TABLE_INV; } re->val = le64_to_cpu(re->val); +if (s->ecs) { +re->rsvd = le64_to_cpu(re->rsvd); +} return 0; } @@ -517,19 +525,30 @@ static inline bool vtd_context_entry_present(VTDContextEntry *context) return context->lo & VTD_CONTEXT_ENTRY_P; } -static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t index, - VTDContextEntry *ce) +static int vtd_get_context_entry_from_root(IntelIOMMUState *s, + VTDRootEntry *root, uint8_t index, VTDContextEntry *ce) { -dma_addr_t addr; +dma_addr_t addr, ce_size; /* we have checked that root entry is present */ -addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce); -if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) { +ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce)); +addr = (s->ecs && (index > 0x7f)) ? + ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) : + ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size); + +if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) { trace_vtd_re_invalid(root->rsvd, root->val); return -VTD_FR_CONTEXT_TABLE_INV; } -ce->lo = le64_to_cpu(ce->lo); -ce->hi = le64_to_cpu(ce->hi); + +ce[0].lo = le64_to_cpu(ce[0].lo); +ce[0].hi = le64_to_cpu(ce[0].hi); + +if (s->ecs) { +ce[1].lo = le64_to_cpu(ce[1].lo); +ce[1].hi = le64_to_cpu(ce[1].hi); +} + return 0; } @@ -595,9 +614,11 @@ static inline uint32_t vtd_get_agaw_from_context_entry(VTDContextEntry *ce) return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9; } -static inline uint32_t vtd_ce_get_type(VTDContextEntry *ce) +static inline uint32_t vtd_ce_get_type(IntelIOMMUState *s, + VTDContextEntry *ce) { -return ce->lo & VTD_CONTEXT_ENTRY_TT; +return s->ecs ? (ce->lo & VTD_CONTEXT_ENTRY_TT) : +(ce->lo & VTD_EXT_CONTEXT_ENTRY_TT); } static inline uint64_t vtd_iova_limit(VTDContextEntry *ce) @@ -842,16 +863,20 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num, return ret_fr; } -if (!vtd_root_entry_present(&re)) { +if (!vtd_root_entry_present(&re) || +(s->ecs && (devfn > 0x7f) && (!vtd_root_entry_upper_present(&re { /* Not error - it's okay we don't have root entry. */ trace_vtd_re_not_present(bus_num); return -VTD_FR_ROOT_ENTRY_P; -} else if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)) { -trace_vtd_re_invalid(re.rsvd, re.val); -return -VTD_FR_ROOT_ENTRY_RSVD; +} +if ((s->ecs && (devfn > 0x7f) && (re.rsvd & VTD_ROOT_ENTRY_RSVD)) || +(s->ecs && (devfn < 0x80) && (re.val & VTD_ROOT_ENTRY_RSVD)) || +((!s->ecs) && (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD { +trace_vtd_re_invalid(re.rsvd, re.val); +return -VTD_FR_ROOT_ENTRY_RSVD; } -ret_fr = vtd_get_context_entry_from_root(&re, devfn, ce); +ret_fr = vtd_get_context_entry_from_root(s, &re, devfn, ce); if (ret_fr) { return ret_fr; } @@ -860,21 +885,36 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num, /* Not error - it's okay we don't have context entry. */ trace_vtd_ce_not_present(bus_num, devfn); return -VTD_FR_CONTEXT_ENTRY_P; -} else if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) || - (ce->lo &
[RFC PATCH 16/20] VFIO: Add notifier for propagating IOMMU TLB invalidate
This patch adds the following items: * add new notifier flag IOMMU_NOTIFIER_IOMMU_TLB_INV = 0x8 * add new IOCTL cmd VFIO_IOMMU_TLB_INVALIDATE attached on container->fd * add vfio_iommu_tlb_invalidate_notify() to propagate IOMMU TLB invalidate to host This new notifier is originated from the requirement of SVM virtualization on VT-d. It is for invalidation of first-level and nested mappings from the IOTLB and the paging-structure-caches. Since the existed MAP/UNMAP notifier is designed for second-level related mappings, it is not suitable for the new requirement. So it is necessary to introduce this new notifier to meet the SVM virtualization requirement. Further detail would be included in the patch below: "intel_iommu: propagate Extended-IOTLB invalidate to host" Signed-off-by: Liu, Yi L --- hw/vfio/pci.c | 37 + include/exec/memory.h | 2 ++ linux-headers/linux/iommu.h | 5 + linux-headers/linux/vfio.h | 8 4 files changed, 52 insertions(+) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index a1e6942..afcefd6 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2619,6 +2619,33 @@ static void vfio_iommu_bind_pasid_tbl_notify(IOMMUNotifier *n, void *data) g_free(vfio_svm); } +static void vfio_iommu_tlb_invalidate_notify(IOMMUNotifier *n, + void *data) +{ +VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n); +VFIOContainer *container = giommu->container; +IOMMUNotifierData *iommu_data = (IOMMUNotifierData *) data; +struct vfio_iommu_tlb_invalidate *vfio_tlb_inv; +int argsz; + +argsz = sizeof(*vfio_tlb_inv) + iommu_data->payload_size; +vfio_tlb_inv = g_malloc0(argsz); +vfio_tlb_inv->argsz = argsz; +vfio_tlb_inv->length = iommu_data->payload_size; + +memcpy(&vfio_tlb_inv->data, iommu_data->payload, + iommu_data->payload_size); + +rcu_read_lock(); +if (ioctl(container->fd, VFIO_IOMMU_TLB_INVALIDATE, + vfio_tlb_inv) != 0) { +error_report("vfio_iommu_tlb_invalidate_notify:" + " failed, contanier: %p", container); +} +rcu_read_unlock(); +g_free(vfio_tlb_inv); +} + static void vfio_realize(PCIDevice *pdev, Error **errp) { VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); @@ -2865,6 +2892,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) QTAILQ_FOREACH(subregion, &as->root->subregions, subregions_link) { if (memory_region_is_iommu(subregion)) { IOMMUNotifier n1; +IOMMUNotifier n2; /* FIXME: current iommu notifier is actually designed for @@ -2882,6 +2910,15 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) 0, &n1); +iommu_notifier_init(&n2, vfio_iommu_tlb_invalidate_notify, +IOMMU_NOTIFIER_IOMMU_TLB_INV, +0, +0); +vfio_register_notifier(group->container, + subregion, + 0, + &n2); + memory_region_notify_device_record(subregion, &vdev->host); diff --git a/include/exec/memory.h b/include/exec/memory.h index 3b8f487..af15351 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -83,6 +83,8 @@ typedef enum { IOMMU_NOTIFIER_MAP = 0x2, /* Notify PASID Table Binding */ IOMMU_NOTIFIER_SVM_PASIDT_BIND = 0x4, +/* Notify IOMMU TLB Invalidation */ +IOMMU_NOTIFIER_IOMMU_TLB_INV = 0x8, } IOMMUNotifierFlag; #define IOMMU_NOTIFIER_MAP_UNMAP (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP) diff --git a/linux-headers/linux/iommu.h b/linux-headers/linux/iommu.h index 4519dcf..c2742ba 100644 --- a/linux-headers/linux/iommu.h +++ b/linux-headers/linux/iommu.h @@ -27,4 +27,9 @@ struct pasid_table_info { __u8 opaque[];/* IOMMU-specific details */ }; +struct tlb_invalidate_info { + __u32 model; + __u8opaque[]; +}; + #endif /* __LINUX_IOMMU_H */ diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index 9848d63..6c71c4a 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -554,6 +554,14 @@ struct vfio_device_svm { #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) +/* For IOMMU Invalidation Passdwon */ +struct vfio_iommu_tlb_invalidate { + __u32 argsz; + __u32 length; + __u8data[]; +}; + +#define VFIO_IOMMU_TLB_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 23) /* Additional API for SPAPR TCE (Server POWERPC) IOMMU */ -- 1.9.1 _
[RFC PATCH 14/20] intel_iommu: add FOR_EACH_ASSIGN_DEVICE macro
Add FOR_EACH_ASSIGN_DEVICE. It would be used to loop all assigned devices when processing guest pasid table linking and iommu cache invalidate propagation. Signed-off-by: Liu, Yi L --- hw/i386/intel_iommu.c | 32 hw/i386/intel_iommu_internal.h | 11 +++ 2 files changed, 43 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 0c412d2..f291995 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -55,6 +55,38 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR); #define VTD_DPRINTF(what, fmt, ...) do {} while (0) #endif +#define FOR_EACH_ASSIGN_DEVICE(__notify_info_type, \ + __opaque_type, \ + __hook_info, \ + __hook_fn) \ +do { \ +IntelIOMMUNotifierNode *node; \ +VTDNotifierIterator iterator; \ +int ret = 0; \ +__notify_info_type *notify_info; \ +__opaque_type *opaq; \ +int argsz; \ +argsz = sizeof(*notify_info) + sizeof(*opaq); \ +notify_info = g_malloc0(argsz); \ +QLIST_FOREACH(node, &(s->notifiers_list), next) { \ +VTDAddressSpace *vtd_as = node->vtd_as; \ +VTDContextEntry ce[2]; \ +iterator.bus = pci_bus_num(vtd_as->bus); \ +ret = vtd_dev_to_context_entry(s, iterator.bus, \ + vtd_as->devfn, &ce[0]); \ +if (ret != 0) { \ +continue; \ +} \ +iterator.sid = vtd_make_source_id(iterator.bus, vtd_as->devfn); \ +iterator.did = VTD_CONTEXT_ENTRY_DID(ce[0].hi); \ +iterator.host_sid = node->host_sid; \ +iterator.vtd_as = vtd_as; \ +iterator.ce = &ce[0]; \ +__hook_fn(&iterator, __hook_info, notify_info); \ +} \ +g_free(notify_info); \ +} while (0) + static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val, uint64_t wmask, uint64_t w1cmask) { diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index f2a7d12..5178398 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -439,6 +439,17 @@ typedef struct VTDRootEntry VTDRootEntry; #define VTD_EXT_CONTEXT_TT_NO_DEV_IOTLB (4ULL << 2) #define VTD_EXT_CONTEXT_TT_DEV_IOTLB (5ULL << 2) +struct VTDNotifierIterator { +VTDAddressSpace *vtd_as; +VTDContextEntry *ce; +uint16_t host_sid; +uint16_t sid; +uint16_t did; +uint8_t bus; +}; + +typedef struct VTDNotifierIterator VTDNotifierIterator; + /* Paging Structure common */ #define VTD_SL_PT_PAGE_SIZE_MASK(1ULL << 7) /* Bits to decide the offset for each level */ -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 09/20] Memory: introduce iommu_ops->record_device
With vIOMMU exposed to guest, vIOMMU emulator needs to do translation between host and guest. e.g. a device-selective TLB flush, vIOMMU emulator needs to replace guest SID with host SID so that to limit the invalidation. This patch introduces a new callback iommu_ops->record_device() to notify vIOMMU emulator to record necessary information about the assigned device. Signed-off-by: Liu, Yi L --- include/exec/memory.h | 11 +++ memory.c | 12 2 files changed, 23 insertions(+) diff --git a/include/exec/memory.h b/include/exec/memory.h index 7bd13ab..49087ef 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps { IOMMUNotifierFlag new_flags); /* Set this up to provide customized IOMMU replay function */ void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier); +void (*record_device)(MemoryRegion *iommu, + void *device_info); }; typedef struct CoalescedMemoryRange CoalescedMemoryRange; @@ -708,6 +710,15 @@ void memory_region_notify_iommu(MemoryRegion *mr, void memory_region_notify_one(IOMMUNotifier *notifier, IOMMUTLBEntry *entry); +/* + * memory_region_notify_device_record: notify IOMMU to record assign + * device. + * @mr: the memory region to notify + * @ device_info: device information + */ +void memory_region_notify_device_record(MemoryRegion *mr, +void *info); + /** * memory_region_register_iommu_notifier: register a notifier for changes to * IOMMU translation entries. diff --git a/memory.c b/memory.c index 0728e62..45ef069 100644 --- a/memory.c +++ b/memory.c @@ -1600,6 +1600,18 @@ static void memory_region_update_iommu_notify_flags(MemoryRegion *mr) mr->iommu_notify_flags = flags; } +void memory_region_notify_device_record(MemoryRegion *mr, +void *info) +{ +assert(memory_region_is_iommu(mr)); + +if (mr->iommu_ops->record_device) { +mr->iommu_ops->record_device(mr, info); +} + +return; +} + void memory_region_register_iommu_notifier(MemoryRegion *mr, IOMMUNotifier *n) { -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 11/20] intel_iommu: provide iommu_ops->record_device
This patch provides iommu_ops->record_device implementation for intel_iommu. It records the host sid in the IntelIOMMUNotifierNode for further virtualization usage. e.g. guest sid -> host sid translation during propagating 1st level cache invalidation from guest to host. Signed-off-by: Liu, Yi L --- hw/i386/intel_iommu.c | 19 +++ include/hw/i386/intel_iommu.h | 1 + 2 files changed, 20 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index ba1e7eb..0c412d2 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -2407,6 +2407,24 @@ static void vtd_iommu_notify_flag_changed(MemoryRegion *iommu, } } +static void vtd_iommu_record_device(MemoryRegion *iommu, +void *device_info) +{ +VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu); +IntelIOMMUState *s = vtd_as->iommu_state; +IntelIOMMUNotifierNode *node = NULL; +IntelIOMMUNotifierNode *next_node = NULL; +PCIHostDeviceAddress *host = (PCIHostDeviceAddress *) device_info; + +QLIST_FOREACH_SAFE(node, &s->notifiers_list, next, next_node) { +if (node->vtd_as == vtd_as) { +node->host_sid = ((host->bus & 0xffUL) << 8) + | ((host->slot & 0x1f) << 3) + | (host->function & 0x7); +} +} +} + static const VMStateDescription vtd_vmstate = { .name = "iommu-intel", .version_id = 1, @@ -2940,6 +2958,7 @@ static void vtd_init(IntelIOMMUState *s) s->iommu_ops.translate = vtd_iommu_translate; s->iommu_ops.notify_flag_changed = vtd_iommu_notify_flag_changed; s->iommu_ops.replay = vtd_iommu_replay; +s->iommu_ops.record_device = vtd_iommu_record_device; s->root = 0; s->root_extended = false; s->dmar_enabled = false; diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index 8981615..a4ce5c3 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -252,6 +252,7 @@ struct VTD_MSIMessage { struct IntelIOMMUNotifierNode { VTDAddressSpace *vtd_as; +uint16_t host_sid; QLIST_ENTRY(IntelIOMMUNotifierNode) next; }; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 20/20] intel_iommu: propagate Ext-Device-TLB invalidate to host
For Extended-Device-TLB invalidation, intel_iommu emulator needs to check all the assigned device and find the affected device. Replace the guest SID with the host SID in the invalidate descriptor and pass the request to host. Host may just submit the request to corresponding invalidation queue in pIOMMU. In future maybe PASID needs to be replaced. Signed-off-by: Liu, Yi L --- hw/i386/intel_iommu.c | 43 ++ hw/i386/intel_iommu_internal.h | 7 +++ 2 files changed, 50 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index c5e9170..4370790 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -2012,6 +2012,13 @@ static void vtd_tlb_inv_notify_hook(VTDNotifierIterator *iter, } else { return; } +case VTD_INV_DESC_EXT_DIOTLB: +if (iter->sid != *tlb_hook_info->sid) { +return; +} +tlb_hook_info->inv_desc->lo &= ~VTD_INV_DESC_EXT_DIOTLB_SID_MASK; +tlb_hook_info->inv_desc->lo |= (iter->host_sid << 16); +break; default: return; } @@ -2147,6 +2154,34 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s, return true; } +static bool vtd_process_ext_device_iotlb(IntelIOMMUState *s, + VTDInvDesc *inv_desc) +{ +uint32_t pasid; +uint16_t sid; +VTDIOTLBInvHookInfo tlb_hook_info; + +if ((inv_desc->lo & VTD_INV_DESC_EXT_DIOTLB_RSVD_LO) || +(inv_desc->hi & VTD_INV_DESC_EXT_DIOTLB_RSVD_HI)) { +VTD_DPRINTF(GENERAL, "error: non-zero reserved field in" +" Device ExIOTLB desc, hi 0x%"PRIx64 " lo 0x%"PRIx64, +inv_desc->hi, inv_desc->lo); +return false; +} + +pasid = VTD_INV_DESC_EXT_DIOTLB_PASID(inv_desc->lo); +sid = VTD_INV_DESC_EXT_DIOTLB_SID(inv_desc->lo); + +tlb_hook_info.did = NULL; +tlb_hook_info.sid = &sid; +tlb_hook_info.pasid = &pasid; +tlb_hook_info.inv_desc = inv_desc; +vtd_tlb_inv_passdown_notify(s, +&tlb_hook_info, +vtd_tlb_inv_notify_hook); +return true; +} + static bool vtd_process_inv_desc(IntelIOMMUState *s) { VTDInvDesc inv_desc; @@ -2190,6 +2225,14 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s) } break; +case VTD_INV_DESC_EXT_DIOTLB: +trace_vtd_inv_desc("device-extended-iotlb", + inv_desc.hi, inv_desc.lo); +if (!vtd_process_ext_device_iotlb(s, &inv_desc)) { +return false; +} +break; + case VTD_INV_DESC_WAIT: trace_vtd_inv_desc("wait", inv_desc.hi, inv_desc.lo); if (!vtd_process_wait_desc(s, &inv_desc)) { diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index a6b9350..3cb2361 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -343,6 +343,7 @@ typedef union VTDInvDesc VTDInvDesc; #define VTD_INV_DESC_WAIT 0x5 /* Invalidation Wait Descriptor */ #define VTD_INV_DESC_EXT_IOTLB 0x6 /* Ext-IOTLB Invalidate Desc */ #define VTD_INV_DESC_PC 0x7 /* PASID-cache Invalidate Desc */ +#define VTD_INV_DESC_EXT_DIOTLB 0x8 /* Ext-DIOTLB Invalidate Desc */ #define VTD_INV_DESC_NONE 0 /* Not an Invalidate Descriptor */ /* Masks for Invalidation Wait Descriptor*/ @@ -407,6 +408,12 @@ typedef union VTDInvDesc VTDInvDesc; #define VTD_INV_DESC_PASIDC_ALL_ALL(0ULL << 4) #define VTD_INV_DESC_PASIDC_PASID_SI (1ULL << 4) +#define VTD_INV_DESC_EXT_DIOTLB_PASID(val) (((val) >> 32) & 0xfULL) +#define VTD_INV_DESC_EXT_DIOTLB_SID(val) (((val) >> 16) & 0x) +#define VTD_INV_DESC_EXT_DIOTLB_RSVD_LO0xe00ULL +#define VTD_INV_DESC_EXT_DIOTLB_RSVD_HI0x7feULL +#define VTD_INV_DESC_EXT_DIOTLB_SID_MASK 0xULL + /* Information about page-selective IOTLB invalidate */ struct VTDIOTLBPageInvInfo { uint16_t domain_id; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 01/20] intel_iommu: add "ecs" option
Report ecap.ECS=1 to guest by "-deivce intel-iommu, ecs=on" in QEMU Cmd Signed-off-by: Liu, Yi L --- hw/i386/intel_iommu.c | 5 + hw/i386/intel_iommu_internal.h | 1 + include/hw/i386/intel_iommu.h | 1 + 3 files changed, 7 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 4b7d90d..400d0d1 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -2409,6 +2409,7 @@ static Property vtd_properties[] = { ON_OFF_AUTO_AUTO), DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false), DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE), +DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE), DEFINE_PROP_END_OF_LIST(), }; @@ -2925,6 +2926,10 @@ static void vtd_init(IntelIOMMUState *s) s->ecap |= VTD_ECAP_PT; } +if (s->ecs) { +s->ecap |= VTD_ECAP_ECS; +} + if (s->caching_mode) { s->cap |= VTD_CAP_CM; } diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index b96884e..ec1bd17 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -190,6 +190,7 @@ #define VTD_ECAP_EIM(1ULL << 4) #define VTD_ECAP_PT (1ULL << 6) #define VTD_ECAP_MHMV (15ULL << 20) +#define VTD_ECAP_ECS(1ULL << 24) /* CAP_REG */ /* (offset >> 4) << 24 */ diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index 3e51876..fa5963e 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -266,6 +266,7 @@ struct IntelIOMMUState { uint32_t version; bool caching_mode; /* RO - is cap CM enabled? */ +bool ecs; /* Extended Context Support */ dma_addr_t root;/* Current root table pointer */ bool root_extended; /* Type of root table (extended or not) */ -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 03/20] intel_iommu: add "svm" option
Expose "Shared Virtual Memory" to guest by using "svm" option. Also use "svm" to expose SVM related capabilities to guest. e.g. "-device intel-iommu, svm=on" Signed-off-by: Liu, Yi L --- hw/i386/intel_iommu.c | 10 ++ hw/i386/intel_iommu_internal.h | 5 + include/hw/i386/intel_iommu.h | 1 + 3 files changed, 16 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index bf98fa5..ba1e7eb 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = { DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false), DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE), DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE), +DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE), DEFINE_PROP_END_OF_LIST(), }; @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s) s->ecap |= VTD_ECAP_ECS; } +if (s->svm) { +if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) { +error_report("Need to set ecs, pt, caching-mode for svm"); +exit(1); +} +s->cap |= VTD_CAP_DWD | VTD_CAP_DRD; +s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28; +} + if (s->caching_mode) { s->cap |= VTD_CAP_CM; } diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 71a1c1e..f2a7d12 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -191,6 +191,9 @@ #define VTD_ECAP_PT (1ULL << 6) #define VTD_ECAP_MHMV (15ULL << 20) #define VTD_ECAP_ECS(1ULL << 24) +#define VTD_ECAP_PASID28(1ULL << 28) +#define VTD_ECAP_PRS(1ULL << 29) +#define VTD_ECAP_PTS(0xeULL << 35) /* CAP_REG */ /* (offset >> 4) << 24 */ @@ -207,6 +210,8 @@ #define VTD_CAP_PSI (1ULL << 39) #define VTD_CAP_SLLPS ((1ULL << 34) | (1ULL << 35)) #define VTD_CAP_CM (1ULL << 7) +#define VTD_CAP_DWD (1ULL << 54) +#define VTD_CAP_DRD (1ULL << 55) /* Supported Adjusted Guest Address Widths */ #define VTD_CAP_SAGAW_SHIFT 8 diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index ae21fe5..8981615 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -267,6 +267,7 @@ struct IntelIOMMUState { bool caching_mode; /* RO - is cap CM enabled? */ bool ecs; /* Extended Context Support */ +bool svm; /* Shared Virtual Memory */ dma_addr_t root;/* Current root table pointer */ bool root_extended; /* Type of root table (extended or not) */ -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 6/8] VFIO: do pasid table binding
From: "Liu, Yi L" This patch adds IOCTL processing in vfio_iommu_type1 for VFIO_IOMMU_SVM_BIND_TASK. Binds the PASID table bind by calling iommu_ops->bind_pasid_table to link the whole PASID table to pIOMMU. For VT-d, it is linking the guest PASID table to host pIOMMU. This is key point to support SVM virtualization on VT-d. Signed-off-by: Liu, Yi L --- drivers/vfio/vfio_iommu_type1.c | 72 + 1 file changed, 72 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index b3cc33f..30b6d48 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1512,6 +1512,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu) return ret; } +struct vfio_svm_task { + struct iommu_domain *domain; + void *payload; +}; + +static int bind_pasid_tbl_fn(struct device *dev, void *data) +{ + int ret = 0; + struct vfio_svm_task *task = data; + struct pasid_table_info *pasidt_binfo; + + pasidt_binfo = task->payload; + ret = iommu_bind_pasid_table(task->domain, dev, pasidt_binfo); + return ret; +} + +static int vfio_do_svm_task(struct vfio_iommu *iommu, void *data, + int (*fn)(struct device *, void *)) +{ + int ret = 0; + struct vfio_domain *d; + struct vfio_group *g; + struct vfio_svm_task task; + + task.payload = data; + + mutex_lock(&iommu->lock); + + list_for_each_entry(d, &iommu->domain_list, next) { + list_for_each_entry(g, &d->group_list, next) { + if (g->iommu_group != NULL) { + task.domain = d->domain; + ret = iommu_group_for_each_dev( + g->iommu_group, &task, fn); + if (ret != 0) + break; + } + } + } + + mutex_unlock(&iommu->lock); + return ret; +} + static long vfio_iommu_type1_ioctl(void *iommu_data, unsigned int cmd, unsigned long arg) { @@ -1582,6 +1626,34 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, return copy_to_user((void __user *)arg, &unmap, minsz) ? -EFAULT : 0; + } else if (cmd == VFIO_IOMMU_SVM_BIND_TASK) { + struct vfio_device_svm hdr; + u8 *data = NULL; + int ret = 0; + + minsz = offsetofend(struct vfio_device_svm, length); + if (copy_from_user(&hdr, (void __user *)arg, minsz)) + return -EFAULT; + + if (hdr.length == 0) + return -EINVAL; + + data = memdup_user((void __user *)(arg + minsz), + hdr.length); + if (IS_ERR(data)) + return PTR_ERR(data); + + switch (hdr.flags & VFIO_SVM_TYPE_MASK) { + case VFIO_SVM_BIND_PASIDTBL: + ret = vfio_do_svm_task(iommu, data, + bind_pasid_tbl_fn); + break; + default: + ret = -EINVAL; + break; + } + kfree(data); + return ret; } return -ENOTTY; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d
Hi, This patchset introduces SVM virtualization for intel_iommu in IOMMU/VFIO. The total SVM virtualization for intel_iommu touched Qemu/IOMMU/VFIO. Another patchset would change the Qemu. It is "[RFC PATCH 0/20] Qemu: Extend intel_iommu emulator to support Shared Virtual Memory" In this patchset, it adds two new IOMMU APIs and their implementation in intel_iommu driver. In VFIO, it adds two IOCTL cmd attached on container->fd to propagate data from QEMU to kernel space. [Patch Overview] * 1 adds iommu API definition for binding guest PASID table * 2 adds binding PASID table API implementation in VT-d iommu driver * 3 adds iommu API definition to do IOMMU TLB invalidation from guest * 4 adds IOMMU TLB invalidation implementation in VT-d iommu driver * 5 adds VFIO IOCTL for propagating PASID table binding from guest * 6 adds processing of pasid table binding in vfio_iommu_type1 * 7 adds VFIO IOCTL for propagating IOMMU TLB invalidation from guest * 8 adds processing of IOMMU TLB invalidation in vfio_iommu_type1 Best Wishes, Yi L Jacob Pan (3): iommu: Introduce bind_pasid_table API function iommu/vt-d: add bind_pasid_table function iommu/vt-d: Add iommu do invalidate function Liu, Yi L (5): iommu: Introduce iommu do invalidate API function VFIO: Add new IOTCL for PASID Table bind propagation VFIO: do pasid table binding VFIO: Add new IOCTL for IOMMU TLB invalidate propagation VFIO: do IOMMU TLB invalidation from guest drivers/iommu/intel-iommu.c | 146 drivers/iommu/iommu.c | 32 + drivers/vfio/vfio_iommu_type1.c | 98 +++ include/linux/dma_remapping.h | 1 + include/linux/intel-iommu.h | 11 +++ include/linux/iommu.h | 47 + include/uapi/linux/vfio.h | 26 +++ 7 files changed, 361 insertions(+) -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 3/8] iommu: Introduce iommu do invalidate API function
From: "Liu, Yi L" When a SVM capable device is assigned to a guest, the first level page tables are owned by the guest and the guest PASID table pointer is linked to the device context entry of the physical IOMMU. Host IOMMU driver has no knowledge of caching structure updates unless the guest invalidation activities are passed down to the host. The primary usage is derived from emulated IOMMU in the guest, where QEMU can trap invalidation activities before pass them down the host/physical IOMMU. There are IOMMU architectural specific actions need to be taken which requires the generic APIs introduced in this patch to have opaque data in the tlb_invalidate_info argument. Signed-off-by: Liu, Yi L Signed-off-by: Jacob Pan --- drivers/iommu/iommu.c | 13 + include/linux/iommu.h | 16 2 files changed, 29 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index f2da636..ca7cff2 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1153,6 +1153,19 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev) } EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table); +int iommu_do_invalidate(struct iommu_domain *domain, + struct device *dev, struct tlb_invalidate_info *inv_info) +{ + int ret = 0; + + if (unlikely(domain->ops->do_invalidate == NULL)) + return -ENODEV; + + ret = domain->ops->do_invalidate(domain, dev, inv_info); + return ret; +} +EXPORT_SYMBOL_GPL(iommu_do_invalidate); + static void __iommu_detach_device(struct iommu_domain *domain, struct device *dev) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 491a011..a48e3b75 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -140,6 +140,11 @@ struct pasid_table_info { __u8opaque[];/* IOMMU-specific details */ }; +struct tlb_invalidate_info { + __u32 model; + __u8opaque[]; +}; + #ifdef CONFIG_IOMMU_API /** @@ -215,6 +220,8 @@ struct iommu_ops { struct pasid_table_info *pasidt_binfo); int (*unbind_pasid_table)(struct iommu_domain *domain, struct device *dev); + int (*do_invalidate)(struct iommu_domain *domain, + struct device *dev, struct tlb_invalidate_info *inv_info); unsigned long pgsize_bitmap; }; @@ -240,6 +247,9 @@ extern int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev, struct pasid_table_info *pasidt_binfo); extern int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev); +extern int iommu_do_invalidate(struct iommu_domain *domain, + struct device *dev, struct tlb_invalidate_info *inv_info); + extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot); @@ -626,6 +636,12 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev) return -EINVAL; } +static inline int iommu_do_invalidate(struct iommu_domain *domain, + struct device *dev, struct tlb_invalidate_info *inv_info) +{ + return -EINVAL; +} + #endif /* CONFIG_IOMMU_API */ #endif /* __LINUX_IOMMU_H */ -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
From: "Liu, Yi L" This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table binding requests. On VT-d, this IOCTL cmd would be used to link the guest PASID page table to host. While for other vendors, it may also be used to support other kind of SVM bind request. Previously, there is a discussion on it with ARM engineer. It can be found by the link below. This IOCTL cmd may support SVM PASID bind request from userspace driver, or page table(cr3) bind request from guest. These SVM bind requests would be supported by adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to support page table bind from guest. https://patchwork.kernel.org/patch/9594231/ Signed-off-by: Liu, Yi L --- include/uapi/linux/vfio.h | 17 + 1 file changed, 17 insertions(+) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 519eff3..6b97987 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap { #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15) #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) +/* IOCTL for Shared Virtual Memory Bind */ +struct vfio_device_svm { + __u32 argsz; +#define VFIO_SVM_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */ +#define VFIO_SVM_BIND_PASID(1 << 1) /* Bind PASID from userspace driver */ +#define VFIO_SVM_BIND_PGTABLE (1 << 2) /* Bind guest mmu page table */ + __u32 flags; + __u32 length; + __u8data[]; +}; + +#define VFIO_SVM_TYPE_MASK (VFIO_SVM_BIND_PASIDTBL | \ + VFIO_SVM_BIND_PASID | \ + VFIO_SVM_BIND_PGTABLE) + +#define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) + /* Additional API for SPAPR TCE (Server POWERPC) IOMMU */ /* -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 2/8] iommu/vt-d: add bind_pasid_table function
From: Jacob Pan Add Intel VT-d ops to the generic iommu_bind_pasid_table API functions. The primary use case is for direct assignment of SVM capable device. Originated from emulated IOMMU in the guest, the request goes through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller passes guest PASID table pointer (GPA) and size. Device context table entry is modified by Intel IOMMU specific bind_pasid_table function. This will turn on nesting mode and matching translation type. The unbind operation restores default context mapping. Signed-off-by: Jacob Pan Signed-off-by: Liu, Yi L --- drivers/iommu/intel-iommu.c | 103 ++ include/linux/dma_remapping.h | 1 + 2 files changed, 104 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 646756c..6d5b939 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5306,6 +5306,105 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev) return iommu; } + +static int intel_iommu_bind_pasid_table(struct iommu_domain *domain, + struct device *dev, struct pasid_table_info *pasidt_binfo) +{ + struct intel_iommu *iommu; + struct context_entry *context; + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + struct device_domain_info *info; + u8 bus, devfn; + u16 did, *sid; + int ret = 0; + unsigned long flags; + u64 ctx_lo; + + if (pasidt_binfo == NULL || pasidt_binfo->model != INTEL_IOMMU) { + pr_warn("%s: Invalid bind request!\n", __func__); + return -EINVAL; + } + + iommu = device_to_iommu(dev, &bus, &devfn); + if (!iommu) + return -ENODEV; + + sid = (u16 *)&pasidt_binfo->opaque; + /* check SID, if it is not correct, return */ + if (PCI_DEVID(bus, devfn) != *sid) + return 0; + + info = dev->archdata.iommu; + if (!info || !info->pasid_supported) { + pr_err("Device %d:%d.%d has no pasid support\n", bus, + PCI_SLOT(devfn), PCI_FUNC(devfn)); + ret = -EINVAL; + goto out; + } + + if (pasidt_binfo->size >= intel_iommu_get_pts(iommu)) { + pr_err("Invalid gPASID table size %llu, host size %lu\n", + pasidt_binfo->size, + intel_iommu_get_pts(iommu)); + ret = -EINVAL; + goto out; + } + spin_lock_irqsave(&iommu->lock, flags); + context = iommu_context_addr(iommu, bus, devfn, 0); + if (!context || !context_present(context)) { + pr_warn("%s: ctx not present for bus devfn %x:%x\n", + __func__, bus, devfn); + spin_unlock_irqrestore(&iommu->lock, flags); + goto out; + } + /* Anticipate guest to use SVM and owns the first level */ + ctx_lo = context[0].lo; + ctx_lo |= CONTEXT_NESTE; + ctx_lo |= CONTEXT_PRS; + ctx_lo |= CONTEXT_PASIDE; + ctx_lo &= ~CONTEXT_TT_MASK; + ctx_lo |= CONTEXT_TT_DEV_IOTLB << 2; + context[0].lo = ctx_lo; + + /* Assign guest PASID table pointer and size */ + ctx_lo = (pasidt_binfo->ptr & VTD_PAGE_MASK) | pasidt_binfo->size; + context[1].lo = ctx_lo; + /* make sure context entry is updated before flushing */ + wmb(); + did = dmar_domain->iommu_did[iommu->seq_id]; + iommu->flush.flush_context(iommu, did, + (((u16)bus) << 8) | devfn, + DMA_CCMD_MASK_NOBIT, + DMA_CCMD_DEVICE_INVL); + iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH); + spin_unlock_irqrestore(&iommu->lock, flags); + + +out: + return ret; +} + +static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain, + struct device *dev) +{ + struct intel_iommu *iommu; + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + u8 bus, devfn; + + iommu = device_to_iommu(dev, &bus, &devfn); + if (!iommu) + return -ENODEV; + /* +* REVISIT: we might want to clear the PASID table pointer +* as part of context clear operation. Currently, it leaves +* stale data but should be ignored by hardware since PASIDE +* is clear. +*/ + /* ATS will be reenabled when remapping is restored */ + pci_disable_ats(to_pci_dev(dev)); + domain_context_clear(iommu, dev); + return domain_context_mapping_one(dmar_domain, iommu, bus, devfn); +} #endif /* CONFIG_INTEL_IOMMU_SVM */ static const struct iommu_ops intel_iommu_ops = { @@ -5314,6 +5413,10
[RFC PATCH 4/8] iommu/vt-d: Add iommu do invalidate function
From: Jacob Pan This patch adds Intel VT-d specific function to implement iommu_do_invalidate API. The use case is for supporting caching structure invalidation of assigned SVM capable devices. Emulated IOMMU exposes queue invalidation capability and passes down all descriptors from the guest to the physical IOMMU. The assumption is that guest to host device ID mapping should be resolved prior to calling IOMMU driver. Based on the device handle, host IOMMU driver can replace certain fields before submit to the invalidation queue. Signed-off-by: Liu, Yi L Signed-off-by: Jacob Pan --- drivers/iommu/intel-iommu.c | 43 +++ include/linux/intel-iommu.h | 11 +++ 2 files changed, 54 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 6d5b939..0b098ad 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5042,6 +5042,48 @@ static void intel_iommu_detach_device(struct iommu_domain *domain, dmar_remove_one_dev_info(to_dmar_domain(domain), dev); } +static int intel_iommu_do_invalidate(struct iommu_domain *domain, + struct device *dev, struct tlb_invalidate_info *inv_info) +{ + int ret = 0; + struct intel_iommu *iommu; + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + struct intel_invalidate_data *inv_data; + struct qi_desc *qi; + u16 did; + u8 bus, devfn; + + if (!inv_info || !dmar_domain || (inv_info->model != INTEL_IOMMU)) + return -EINVAL; + + iommu = device_to_iommu(dev, &bus, &devfn); + if (!iommu) + return -ENODEV; + + inv_data = (struct intel_invalidate_data *)&inv_info->opaque; + + /* check SID */ + if (PCI_DEVID(bus, devfn) != inv_data->sid) + return 0; + + qi = &inv_data->inv_desc; + + switch (qi->low & QI_TYPE_MASK) { + case QI_DIOTLB_TYPE: + case QI_DEIOTLB_TYPE: + /* for device IOTLB, we just let it pass through */ + break; + default: + did = dmar_domain->iommu_did[iommu->seq_id]; + set_mask_bits(&qi->low, QI_DID_MASK, QI_DID(did)); + break; + } + + ret = qi_submit_sync(qi, iommu); + + return ret; +} + static int intel_iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t hpa, size_t size, int iommu_prot) @@ -5416,6 +5458,7 @@ static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain, #ifdef CONFIG_INTEL_IOMMU_SVM .bind_pasid_table = intel_iommu_bind_pasid_table, .unbind_pasid_table = intel_iommu_unbind_pasid_table, + .do_invalidate = intel_iommu_do_invalidate, #endif .map= intel_iommu_map, .unmap = intel_iommu_unmap, diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index ac04f28..9d6562c 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -271,6 +272,10 @@ enum { #define QI_PGRP_RESP_TYPE 0x9 #define QI_PSTRM_RESP_TYPE 0xa +#define QI_DID(did)(((u64)did & 0x) << 16) +#define QI_DID_MASKGENMASK(31, 16) +#define QI_TYPE_MASK GENMASK(3, 0) + #define QI_IEC_SELECTIVE (((u64)1) << 4) #define QI_IEC_IIDEX(idx) (((u64)(idx & 0x) << 32)) #define QI_IEC_IM(m) (((u64)(m & 0x1f) << 27)) @@ -529,6 +534,12 @@ struct intel_svm { extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev); #endif +struct intel_invalidate_data { + u16 sid; + u32 pasid; + struct qi_desc inv_desc; +}; + extern const struct attribute_group *intel_iommu_groups[]; extern void intel_iommu_debugfs_init(void); extern struct context_entry *iommu_context_addr(struct intel_iommu *iommu, -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 19/20] intel_iommu: propagate PASID-Cache invalidate to host
This patch adds support for propagating PASID-Cache invalidation to host. Similar with Extended-IOTLB invalidation, intel_iommu emulator would also check all the assigned devices and do sanity check, then pass it to host. Host pIOMMU driver would replace some fields in the raw data before submitting to pIOMMU. e.g. guest domain ID must be replaced with the real domain ID in host. In future PASID may need to be replaced. Signed-off-by: Liu, Yi L --- hw/i386/intel_iommu.c | 56 ++ hw/i386/intel_iommu_internal.h | 10 2 files changed, 66 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 5fbb7f1..c5e9170 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -2006,6 +2006,7 @@ static void vtd_tlb_inv_notify_hook(VTDNotifierIterator *iter, tlb_hook_info = (VTDIOTLBInvHookInfo *) hook_info; switch (tlb_hook_info->inv_desc->lo & VTD_INV_DESC_TYPE) { case VTD_INV_DESC_EXT_IOTLB: +case VTD_INV_DESC_PC: if (iter->did == *tlb_hook_info->did) { break; } else { @@ -2098,6 +2099,54 @@ static bool vtd_process_exiotlb_desc(IntelIOMMUState *s, return true; } +static bool vtd_process_pasid_desc(IntelIOMMUState *s, + VTDInvDesc *inv_desc) +{ +uint16_t domain_id; +uint32_t pasid; +VTDIOTLBInvHookInfo tlb_hook_info; + +if ((inv_desc->lo & VTD_INV_DESC_PASIDC_RSVD_LO) || +(inv_desc->hi & VTD_INV_DESC_PASIDC_RSVD_HI)) { +VTD_DPRINTF(GENERAL, "error: non-zero reserved field" +" in PASID desc, hi 0x%"PRIx64 " lo 0x%"PRIx64, +inv_desc->hi, inv_desc->lo); +return false; +} + +domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->lo); + +switch (inv_desc->lo & VTD_INV_DESC_PASIDC_G) { +case VTD_INV_DESC_PASIDC_ALL_ALL: +VTD_DPRINTF(INV, "Invalidate all PASID"); +break; + +case VTD_INV_DESC_PASIDC_PASID_SI: +VTD_DPRINTF(INV, "pasid-selective invalidation" +" domain 0x%"PRIx16, domain_id); +break; + +default: +VTD_DPRINTF(GENERAL, "error: invalid granularity" +" in PASID-Cache Invalidate Descriptor" +" hi 0x%"PRIx64 " lo 0x%"PRIx64, +inv_desc->hi, inv_desc->lo); +return false; +} + +pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->lo); + +tlb_hook_info.did = &domain_id; +tlb_hook_info.sid = NULL; +tlb_hook_info.pasid = &pasid; +tlb_hook_info.inv_desc = inv_desc; +vtd_tlb_inv_passdown_notify(s, +&tlb_hook_info, +vtd_tlb_inv_notify_hook); + +return true; +} + static bool vtd_process_inv_desc(IntelIOMMUState *s) { VTDInvDesc inv_desc; @@ -2134,6 +2183,13 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s) } break; + case VTD_INV_DESC_PC: +trace_vtd_inv_desc("pasid-cache", inv_desc.hi, inv_desc.lo); +if (!vtd_process_pasid_desc(s, &inv_desc)) { +return false; +} +break; + case VTD_INV_DESC_WAIT: trace_vtd_inv_desc("wait", inv_desc.hi, inv_desc.lo); if (!vtd_process_wait_desc(s, &inv_desc)) { diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 9f89751..a6b9350 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -342,6 +342,7 @@ typedef union VTDInvDesc VTDInvDesc; Invalidate Descriptor */ #define VTD_INV_DESC_WAIT 0x5 /* Invalidation Wait Descriptor */ #define VTD_INV_DESC_EXT_IOTLB 0x6 /* Ext-IOTLB Invalidate Desc */ +#define VTD_INV_DESC_PC 0x7 /* PASID-cache Invalidate Desc */ #define VTD_INV_DESC_NONE 0 /* Not an Invalidate Descriptor */ /* Masks for Invalidation Wait Descriptor*/ @@ -397,6 +398,15 @@ typedef union VTDInvDesc VTDInvDesc; #define VTD_INV_DESC_EXIOTLB_IH(val) (((val) >> 6) & 0x1) #define VTD_INV_DESC_EXIOTLB_GL(val) (((val) >> 7) & 0x1) +#define VTD_INV_DESC_PASIDC_G (3ULL << 4) +#define VTD_INV_DESC_PASIDC_PASID(val) (((val) >> 32) & 0xfULL) +#define VTD_INV_DESC_PASIDC_DID(val) (((val) >> 16) & VTD_DOMAIN_ID_MASK) +#define VTD_INV_DESC_PASIDC_RSVD_LO0xfff0ffc0ULL +#define VTD_INV_DESC_PASIDC_RSVD_HI0xULL + +#define VTD_INV_DESC_PASIDC_ALL_ALL(0ULL << 4) +#define VTD_INV_DESC_PASIDC_PASID_SI (1ULL << 4) + /* Information about page-selective IOTLB invalidate */ struct VTDIOTLBPageInvInfo { uint16_t domain_id; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 8/8] VFIO: do IOMMU TLB invalidation from guest
From: "Liu, Yi L" This patch adds support for VFIO_IOMMU_TLB_INVALIDATE cmd in vfio_iommu_type1. For SVM virtualization on VT-d, for VFIO_IOMMU_TLB_INVALIDATE, it calls iommu_ops->do_invalidate() to submit the guest iommu cache invalidation to pIOMMU. Signed-off-by: Liu, Yi L --- drivers/vfio/vfio_iommu_type1.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 30b6d48..6cebdfd 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1528,6 +1528,17 @@ static int bind_pasid_tbl_fn(struct device *dev, void *data) return ret; } +static int do_tlb_inv_fn(struct device *dev, void *data) +{ + int ret = 0; + struct vfio_svm_task *task = data; + struct tlb_invalidate_info *inv_info; + + inv_info = task->payload; + ret = iommu_do_invalidate(task->domain, dev, inv_info); + return ret; +} + static int vfio_do_svm_task(struct vfio_iommu *iommu, void *data, int (*fn)(struct device *, void *)) { @@ -1654,6 +1665,21 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, } kfree(data); return ret; + } else if (cmd == VFIO_IOMMU_TLB_INVALIDATE) { + struct vfio_iommu_tlb_invalidate hdr; + u8 *data = NULL; + int ret = 0; + + minsz = offsetofend(struct vfio_iommu_tlb_invalidate, length); + if (copy_from_user(&hdr, (void __user *)arg, minsz)) + return -EFAULT; + if (hdr.length == 0) + return -EINVAL; + data = memdup_user((void __user *)(arg + minsz), + hdr.length); + ret = vfio_do_svm_task(iommu, data, do_tlb_inv_fn); + kfree(data); + return ret; } return -ENOTTY; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
From: "Liu, Yi L" This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB invalidate request from guest to host. In the case of SVM virtualization on VT-d, host IOMMU driver has no knowledge of caching structure updates unless the guest invalidation activities are passed down to the host. So a new IOCTL is needed to propagate the guest cache invalidation through VFIO. Signed-off-by: Liu, Yi L --- include/uapi/linux/vfio.h | 9 + 1 file changed, 9 insertions(+) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 6b97987..50c51f8 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -564,6 +564,15 @@ struct vfio_device_svm { #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) +/* For IOMMU TLB Invalidation Propagation */ +struct vfio_iommu_tlb_invalidate { + __u32 argsz; + __u32 length; + __u8data[]; +}; + +#define VFIO_IOMMU_TLB_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 23) + /* Additional API for SPAPR TCE (Server POWERPC) IOMMU */ /* -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 18/20] intel_iommu: propagate Extended-IOTLB invalidate to host
The invalidation of Extended-IOTLB invalidates first-level and nested mappings from the IOTLB and the paging-structure-caches. For SVM virtualization, iommu tlb invalidate notifier is added. The reason is as below: * On VT-d, MAP/UNMAP notifier would be used to shadow the changes of the guest second-level page table. While for the 1st-level page table, is not shadowed like the way of second-level page table. Actually, the guest 1st-level page table is linked to host after the whole guest PASID table is linked to host. 1st-level page table is owned by guest in this SVM virtualization solution for VT-d. Guest should have modified the 1st-level page table in memory before it issues the invalidate request for 1st-level mappings, so MAP/UNMAP notifier is not suitable for the invalidation of guest 1st-level mappings. * Since guest owns the 1st-level page table, host have no knowledge about the invalidations to 1st-level related mappings. So intel_iommu emulator needs to propagate the invalidate request to host, then host invalidates the 1st-level and nested mapping in IOTLB and paging-structure-caches on host. So a new notifier is added to meet such requirement. Before passing the invalidate request to host, intel_iommu emulator needs to do specific translation to the invalidation request. e.g. granularity translation, needs to limit the scope of the invalidate. This patchset proposes passing raw data from guest to host when propagating the guest IOMMU TLB invalidation. As the cover letter mentioned, there is both pros and cons for passing raw data. Would be pleased to see comments on the way how to pass the invalidate request to host. For Extended-IOTLB invalidation, intel_iommu emulator would check all the assigned devices to see if the device is affected by the invalidate request, also intel_iommu emulator needs to do sanity check to the invalidate request and then pass it to host. Host would replace some fields in the raw data before submitting to pIOMMU. e.g. guest domain ID must be replaced with the real domain ID in host. In future PASID may also need to be replaced. Signed-off-by: Liu, Yi L --- hw/i386/intel_iommu.c | 126 + hw/i386/intel_iommu_internal.h | 33 +++ 2 files changed, 159 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index cd6db65..5fbb7f1 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -64,6 +64,10 @@ static void vtd_context_inv_notify_hook(VTDNotifierIterator *iter, void *hook_info, void *notify_info); +static void vtd_tlb_inv_notify_hook(VTDNotifierIterator *iter, +void *hook_info, +void *notify_info); + #define FOR_EACH_ASSIGN_DEVICE(__notify_info_type, \ __opaque_type, \ __hook_info, \ @@ -1979,6 +1983,121 @@ done: return true; } +static void vtd_tlb_inv_passdown_notify(IntelIOMMUState *s, +VTDIOTLBInvHookInfo *hook_info, +vtd_device_hook hook_fn) +{ +FOR_EACH_ASSIGN_DEVICE(struct tlb_invalidate_info, + VTDInvalidateData, + hook_info, + hook_fn); +return; +} + +static void vtd_tlb_inv_notify_hook(VTDNotifierIterator *iter, + void *hook_info, + void *notify_info) +{ +struct tlb_invalidate_info *tlb_inv_info; +IOMMUNotifierData iommu_data; +VTDIOTLBInvHookInfo *tlb_hook_info; +VTDInvalidateData *inv_data; +tlb_inv_info = (struct tlb_invalidate_info *) notify_info; +tlb_hook_info = (VTDIOTLBInvHookInfo *) hook_info; +switch (tlb_hook_info->inv_desc->lo & VTD_INV_DESC_TYPE) { +case VTD_INV_DESC_EXT_IOTLB: +if (iter->did == *tlb_hook_info->did) { +break; +} else { +return; +} +default: +return; +} + +tlb_inv_info->model = INTEL_IOMMU; + +inv_data = (VTDInvalidateData *)&tlb_inv_info->opaque; +inv_data->pasid = *tlb_hook_info->pasid; +inv_data->sid = iter->host_sid; +inv_data->inv_desc = *tlb_hook_info->inv_desc; + +iommu_data.payload = (uint8_t *) tlb_inv_info; +iommu_data.payload_size = sizeof(*tlb_inv_info) + sizeof(*inv_data); + +memory_region_notify_iommu_invalidate(&iter->vtd_as->iommu, + &iommu_data); +} + +static bool vtd_process_exiotlb_desc(IntelIOMMUState *s, + VTDInvDesc *inv_desc) +{ +uint16_t domain_id; +uint32_t pasid; +uint8_t am; +VTDIOTLBInvHookInfo tlb_hook_info; + +if ((inv_desc->lo & VTD_INV_DESC_EXIOTLB_
[RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
From: Jacob Pan Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use case in the guest: https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html As part of the proposed architecture, when a SVM capable PCI device is assigned to a guest, nested mode is turned on. Guest owns the first level page tables (request with PASID) and performs GVA->GPA translation. Second level page tables are owned by the host for GPA->HPA translation for both request with and without PASID. A new IOMMU driver interface is therefore needed to perform tasks as follows: * Enable nested translation and appropriate translation type * Assign guest PASID table pointer (in GPA) and size to host IOMMU This patch introduces new functions called iommu_(un)bind_pasid_table() to IOMMU APIs. Architecture specific IOMMU function can be added later to perform the specific steps for binding pasid table of assigned devices. This patch also adds model definition in iommu.h. It would be used to check if the bind request is from a compatible entity. e.g. a bind request from an intel_iommu emulator may not be supported by an ARM SMMU driver. Signed-off-by: Jacob Pan Signed-off-by: Liu, Yi L --- drivers/iommu/iommu.c | 19 +++ include/linux/iommu.h | 31 +++ 2 files changed, 50 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index dbe7f65..f2da636 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev) } EXPORT_SYMBOL_GPL(iommu_attach_device); +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev, + struct pasid_table_info *pasidt_binfo) +{ + if (unlikely(!domain->ops->bind_pasid_table)) + return -EINVAL; + + return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo); +} +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table); + +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev) +{ + if (unlikely(!domain->ops->unbind_pasid_table)) + return -EINVAL; + + return domain->ops->unbind_pasid_table(domain, dev); +} +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table); + static void __iommu_detach_device(struct iommu_domain *domain, struct device *dev) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0ff5111..491a011 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -131,6 +131,15 @@ struct iommu_dm_region { int prot; }; +struct pasid_table_info { + __u64 ptr;/* PASID table ptr */ + __u64 size; /* PASID table size*/ + __u32 model; /* magic number */ +#define INTEL_IOMMU(1 << 0) +#define ARM_SMMU (1 << 1) + __u8opaque[];/* IOMMU-specific details */ +}; + #ifdef CONFIG_IOMMU_API /** @@ -159,6 +168,8 @@ struct iommu_dm_region { * @domain_get_windows: Return the number of windows for a domain * @of_xlate: add OF master IDs to iommu grouping * @pgsize_bitmap: bitmap of all possible supported page sizes + * @bind_pasid_table: bind pasid table pointer for guest SVM + * @unbind_pasid_table: unbind pasid table pointer and restore defaults */ struct iommu_ops { bool (*capable)(enum iommu_cap); @@ -200,6 +211,10 @@ struct iommu_ops { u32 (*domain_get_windows)(struct iommu_domain *domain); int (*of_xlate)(struct device *dev, struct of_phandle_args *args); + int (*bind_pasid_table)(struct iommu_domain *domain, struct device *dev, + struct pasid_table_info *pasidt_binfo); + int (*unbind_pasid_table)(struct iommu_domain *domain, + struct device *dev); unsigned long pgsize_bitmap; }; @@ -221,6 +236,10 @@ extern int iommu_attach_device(struct iommu_domain *domain, struct device *dev); extern void iommu_detach_device(struct iommu_domain *domain, struct device *dev); +extern int iommu_bind_pasid_table(struct iommu_domain *domain, + struct device *dev, struct pasid_table_info *pasidt_binfo); +extern int iommu_unbind_pasid_table(struct iommu_domain *domain, + struct device *dev); extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot); @@ -595,6 +614,18 @@ const struct iommu_ops *iommu_get_instance(struct fwnode_handle *fwnode) return NULL; } +static inline +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev, + struct pasid_table_info *pasidt_binfo) +{ + return -EINVAL; +} +static inline +int iommu_unbind_pasid_table(struc
Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier
On Wed, Apr 26, 2017 at 03:50:16PM +0200, Paolo Bonzini wrote: > > > On 26/04/2017 12:06, Liu, Yi L wrote: > > +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr, > > + void *data) > > +{ > > +IOMMUNotifier *iommu_notifier; > > +IOMMUNotifierFlag request_flags; > > + > > +assert(memory_region_is_iommu(mr)); > > + > > +/*TODO: support other bind requests with smaller gran, > > + * e.g. bind signle pasid entry > > + */ > > +request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND; > > + > > +QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) { > > +if (iommu_notifier->notifier_flags & request_flags) { > > +iommu_notifier->notify(iommu_notifier, data); > > +break; > > +} > > +} > > Peter, > > should this reuse ->notify, or should it be different function pointer > in IOMMUNotifier? Hi Paolo, Thx for your review. I think it should be “->notify” here. In this patchset, the new notifier is registered with the existing notifier registration API. So the all the notifiers are in the mr->iommu_notify list. And notifiers are labeled by notify flag, so it is able to differentiate the IOMMUNotifier nodes. When the flag meets, trigger it by “->notify”. The diagram below shows my understanding , wish it helps to make me understood. VFIOContainer | giommu_list(VFIOGuestIOMMU) \ VFIOGuestIOMMU1 -> VFIOGuestIOMMU2 -> VFIOGuestIOMMU3 ... | | | mr->iommu_notify: IOMMUNotifier ->IOMMUNotifier -> IOMMUNotifier (Flag:MAP/UNMAP) (Flag:SVM bind) (Flag:tlb invalidate) Actually, compared with the MAP/UNMAP notifier, the newly added notifier has no start/end check, and there may be other types of bind notfier flag in future, so I added a separate fire func for SVM bind notifier. Thanks, Yi L > Paolo > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
On Wed, Apr 26, 2017 at 05:56:50PM +0100, Jean-Philippe Brucker wrote: > On 26/04/17 11:12, Liu, Yi L wrote: > > From: "Liu, Yi L" > > > > This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table > > binding requests. > > > > On VT-d, this IOCTL cmd would be used to link the guest PASID page table > > to host. While for other vendors, it may also be used to support other > > kind of SVM bind request. Previously, there is a discussion on it with > > ARM engineer. It can be found by the link below. This IOCTL cmd may > > support SVM PASID bind request from userspace driver, or page table(cr3) > > bind request from guest. These SVM bind requests would be supported by > > adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support > > PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to > > support page table bind from guest. > > > > https://patchwork.kernel.org/patch/9594231/ > > > > Signed-off-by: Liu, Yi L > > --- > > include/uapi/linux/vfio.h | 17 + > > 1 file changed, 17 insertions(+) > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > index 519eff3..6b97987 100644 > > --- a/include/uapi/linux/vfio.h > > +++ b/include/uapi/linux/vfio.h > > @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap { > > #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15) > > #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) > > > > +/* IOCTL for Shared Virtual Memory Bind */ > > +struct vfio_device_svm { > > + __u32 argsz; > > +#define VFIO_SVM_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */ > > +#define VFIO_SVM_BIND_PASID(1 << 1) /* Bind PASID from userspace > > driver */ > > +#define VFIO_SVM_BIND_PGTABLE (1 << 2) /* Bind guest mmu page table */ > > + __u32 flags; > > + __u32 length; > > + __u8data[]; > > +}; > > + > > +#define VFIO_SVM_TYPE_MASK (VFIO_SVM_BIND_PASIDTBL | \ > > + VFIO_SVM_BIND_PASID | \ > > + VFIO_SVM_BIND_PGTABLE) > > + > > +#define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) > > This could be called "VFIO_IOMMU_SVM_BIND, since it will be used both to > bind tables and individual tasks. yes, it is. would modify it in next version. Thanks, Yi L > Thanks, > Jean > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote: > Hi Yi, Jacob, > > On 26/04/17 11:11, Liu, Yi L wrote: > > From: Jacob Pan > > > > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use > > case in the guest: > > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html > > > > As part of the proposed architecture, when a SVM capable PCI > > device is assigned to a guest, nested mode is turned on. Guest owns the > > first level page tables (request with PASID) and performs GVA->GPA > > translation. Second level page tables are owned by the host for GPA->HPA > > translation for both request with and without PASID. > > > > A new IOMMU driver interface is therefore needed to perform tasks as > > follows: > > * Enable nested translation and appropriate translation type > > * Assign guest PASID table pointer (in GPA) and size to host IOMMU > > > > This patch introduces new functions called iommu_(un)bind_pasid_table() > > to IOMMU APIs. Architecture specific IOMMU function can be added later > > to perform the specific steps for binding pasid table of assigned devices. > > > > This patch also adds model definition in iommu.h. It would be used to > > check if the bind request is from a compatible entity. e.g. a bind > > request from an intel_iommu emulator may not be supported by an ARM SMMU > > driver. > > > > Signed-off-by: Jacob Pan > > Signed-off-by: Liu, Yi L > > --- > > drivers/iommu/iommu.c | 19 +++ > > include/linux/iommu.h | 31 +++ > > 2 files changed, 50 insertions(+) > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index dbe7f65..f2da636 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, > > struct device *dev) > > } > > EXPORT_SYMBOL_GPL(iommu_attach_device); > > > > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev, > > + struct pasid_table_info *pasidt_binfo) > > I guess that domain can always be deduced from dev using > iommu_get_domain_for_dev, and doesn't need to be passed as argument? > > For the next version of my SVM series, I was thinking of passing group > instead of device to iommu_bind. Since all devices in a group are expected > to share the same mappings (whether they want it or not), users will have > to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it > might be simpler to let the IOMMU core take the group lock and do > group->domain->ops->bind_task(dev...) for each device. The question also > holds for iommu_do_invalidate in patch 3/8. > > This way the prototypes would be: > int iommu_bind...(struct iommu_group *group, struct ... *info) > int iommu_unbind...(struct iommu_group *group, struct ...*info) > int iommu_invalidate...(struct iommu_group *group, struct ...*info) > > For PASID table binding it might not matter much, as VFIO will most likely > be the only user. But task binding will be called by device drivers, which > by now should be encouraged to do things at iommu_group granularity. > Alternatively it could be done implicitly like in iommu_attach_device, > with "iommu_bind_device_x" calling "iommu_bind_group_x". > > > Extending this reasoning, since groups in a domain are also supposed to > have the same mappings, then similarly to map/unmap, > bind/unbind/invalidate should really be done with an iommu_domain (and > nothing else) as target argument. However this requires the IOMMU core to > keep a group list in each domain, which might complicate things a little > too much. > > But "all devices in a domain share the same PASID table" is the paradigm > I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with > iommu_group, it should be made more explicit to users, so they don't > assume that devices within a domain are isolated from each others with > regard to PASID DMA. > > > +{ > > + if (unlikely(!domain->ops->bind_pasid_table)) > > + return -EINVAL; > > + > > + return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo); > > +} > > +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table); > > + > > +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device > > *dev) > > +{ > > + if (unlikely(!domain->ops->unbind_pasid_table)) > > + return -EINVAL; > > + > > + return domain->o
Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier
On Thu, Apr 27, 2017 at 02:14:27PM +0800, Peter Xu wrote: > On Thu, Apr 27, 2017 at 10:37:19AM +0800, Liu, Yi L wrote: > > On Wed, Apr 26, 2017 at 03:50:16PM +0200, Paolo Bonzini wrote: > > > > > > > > > On 26/04/2017 12:06, Liu, Yi L wrote: > > > > +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr, > > > > + void *data) > > > > +{ > > > > +IOMMUNotifier *iommu_notifier; > > > > +IOMMUNotifierFlag request_flags; > > > > + > > > > +assert(memory_region_is_iommu(mr)); > > > > + > > > > +/*TODO: support other bind requests with smaller gran, > > > > + * e.g. bind signle pasid entry > > > > + */ > > > > +request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND; > > > > + > > > > +QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) { > > > > +if (iommu_notifier->notifier_flags & request_flags) { > > > > +iommu_notifier->notify(iommu_notifier, data); > > > > +break; > > > > +} > > > > +} > > > > > > Peter, > > > > > > should this reuse ->notify, or should it be different function pointer > > > in IOMMUNotifier? > > > > Hi Paolo, > > > > Thx for your review. > > > > I think it should be “->notify” here. In this patchset, the new notifier > > is registered with the existing notifier registration API. So the all the > > notifiers are in the mr->iommu_notify list. And notifiers are labeled > > by notify flag, so it is able to differentiate the IOMMUNotifier nodes. > > When the flag meets, trigger it by “->notify”. The diagram below shows > > my understanding , wish it helps to make me understood. > > > > VFIOContainer > >| > >giommu_list(VFIOGuestIOMMU) > > \ > > VFIOGuestIOMMU1 -> VFIOGuestIOMMU2 -> VFIOGuestIOMMU3 ... > > | | | > > mr->iommu_notify: IOMMUNotifier ->IOMMUNotifier -> IOMMUNotifier > > (Flag:MAP/UNMAP) (Flag:SVM bind) (Flag:tlb > > invalidate) > > > > > > Actually, compared with the MAP/UNMAP notifier, the newly added notifier has > > no start/end check, and there may be other types of bind notfier flag in > > future, so I added a separate fire func for SVM bind notifier. > > I agree with Paolo that this interface might not be the suitable place > for the SVM notifiers (just like what I worried about in previous > discussions). > > The biggest problem is that, if you see current notifier mechanism, > it's per-memory-region. However iiuc your messages should be > per-iommu, or say, per translation unit. Hi Peter, yes, you're right. the newly added notifier is per-iommu. > While, for each iommu, there > can be more than one memory regions (ppc can be an example). When > there are more than one MRs binded to the same iommu unit, which > memory region should you register to? Any one of them, or all? Honestly, I'm not expert on ppc. According to the current code, I can only find one MR initialized with memory_region_init_iommu() in spapr_tce_table_realize(). So to better get your point, let me check. Do you mean there may be multiple of iommu MRs behind a iommu? I admit it must be considered if there are multiple iommu MRs. I may choose to register for one of them since the notifier is per-iommu as you've pointed. Then vIOMMU emulator need to trigger the notifier with the correct MR. Not sure if ppc vIOMMU is fine with it. > So my conclusion is, it just has nothing to do with memory regions... > > Instead of a different function pointer in IOMMUNotifer, IMHO we can > even move a step further, to isolate IOTLB notifications (targeted at > memory regions and with start/end ranges) out of SVM/other > notifications, since they are different in general. So we basically > need two notification mechanism: > > - one for memory regions, currently what I can see is IOTLB > notifications > > - one for translation units, currently I see all the rest of > notifications needed in virt-svm in this category > > Maybe some RFC patches would be good to show what I mean... I'll see > whether I can prepare some. I agree that it would be helpful to split the two kinds of notifiers. I marked it as a FIXME in patch 0006 of this series. Just saw your RFC patch for common IOMMUObject. Thx for your work, would try to review it. Besides the notifier registration, pls also help to review the SVM virtualization itself. Would be glad to know your comments. Thanks, Yi L > Thanks, > > -- > Peter Xu > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
On Thu, Apr 27, 2017 at 11:12:45AM +0100, Jean-Philippe Brucker wrote: > On 27/04/17 07:36, Liu, Yi L wrote: > > On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote: > >> Hi Yi, Jacob, > >> > >> On 26/04/17 11:11, Liu, Yi L wrote: > >>> From: Jacob Pan > >>> > >>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use > >>> case in the guest: > >>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html > >>> > >>> As part of the proposed architecture, when a SVM capable PCI > >>> device is assigned to a guest, nested mode is turned on. Guest owns the > >>> first level page tables (request with PASID) and performs GVA->GPA > >>> translation. Second level page tables are owned by the host for GPA->HPA > >>> translation for both request with and without PASID. > >>> > >>> A new IOMMU driver interface is therefore needed to perform tasks as > >>> follows: > >>> * Enable nested translation and appropriate translation type > >>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU > >>> > >>> This patch introduces new functions called iommu_(un)bind_pasid_table() > >>> to IOMMU APIs. Architecture specific IOMMU function can be added later > >>> to perform the specific steps for binding pasid table of assigned devices. > >>> > >>> This patch also adds model definition in iommu.h. It would be used to > >>> check if the bind request is from a compatible entity. e.g. a bind > >>> request from an intel_iommu emulator may not be supported by an ARM SMMU > >>> driver. > >>> > >>> Signed-off-by: Jacob Pan > >>> Signed-off-by: Liu, Yi L > >>> --- > >>> drivers/iommu/iommu.c | 19 +++ > >>> include/linux/iommu.h | 31 +++ > >>> 2 files changed, 50 insertions(+) > >>> > >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > >>> index dbe7f65..f2da636 100644 > >>> --- a/drivers/iommu/iommu.c > >>> +++ b/drivers/iommu/iommu.c > >>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain > >>> *domain, struct device *dev) > >>> } > >>> EXPORT_SYMBOL_GPL(iommu_attach_device); > >>> > >>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device > >>> *dev, > >>> + struct pasid_table_info *pasidt_binfo) > >> > >> I guess that domain can always be deduced from dev using > >> iommu_get_domain_for_dev, and doesn't need to be passed as argument? > >> > >> For the next version of my SVM series, I was thinking of passing group > >> instead of device to iommu_bind. Since all devices in a group are expected > >> to share the same mappings (whether they want it or not), users will have > >> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it > >> might be simpler to let the IOMMU core take the group lock and do > >> group->domain->ops->bind_task(dev...) for each device. The question also > >> holds for iommu_do_invalidate in patch 3/8. > >> > >> This way the prototypes would be: > >> int iommu_bind...(struct iommu_group *group, struct ... *info) > >> int iommu_unbind...(struct iommu_group *group, struct ...*info) > >> int iommu_invalidate...(struct iommu_group *group, struct ...*info) > >> > >> For PASID table binding it might not matter much, as VFIO will most likely > >> be the only user. But task binding will be called by device drivers, which > >> by now should be encouraged to do things at iommu_group granularity. > >> Alternatively it could be done implicitly like in iommu_attach_device, > >> with "iommu_bind_device_x" calling "iommu_bind_group_x". > >> > >> > >> Extending this reasoning, since groups in a domain are also supposed to > >> have the same mappings, then similarly to map/unmap, > >> bind/unbind/invalidate should really be done with an iommu_domain (and > >> nothing else) as target argument. However this requires the IOMMU core to > >> keep a group list in each domain, which might complicate things a little > >> too much. > >> > >> But "all devices in a domain share the same PASID table" is the paradigm > >> I'm currently usin
Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote: > Hi Yi, Jacob, > > On 26/04/17 11:11, Liu, Yi L wrote: > > From: Jacob Pan > > > > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use > > case in the guest: > > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html > > > > As part of the proposed architecture, when a SVM capable PCI > > device is assigned to a guest, nested mode is turned on. Guest owns the > > first level page tables (request with PASID) and performs GVA->GPA > > translation. Second level page tables are owned by the host for GPA->HPA > > translation for both request with and without PASID. > > > > A new IOMMU driver interface is therefore needed to perform tasks as > > follows: > > * Enable nested translation and appropriate translation type > > * Assign guest PASID table pointer (in GPA) and size to host IOMMU > > > > This patch introduces new functions called iommu_(un)bind_pasid_table() > > to IOMMU APIs. Architecture specific IOMMU function can be added later > > to perform the specific steps for binding pasid table of assigned devices. > > > > This patch also adds model definition in iommu.h. It would be used to > > check if the bind request is from a compatible entity. e.g. a bind > > request from an intel_iommu emulator may not be supported by an ARM SMMU > > driver. > > > > Signed-off-by: Jacob Pan > > Signed-off-by: Liu, Yi L > > --- > > drivers/iommu/iommu.c | 19 +++ > > include/linux/iommu.h | 31 +++ > > 2 files changed, 50 insertions(+) > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index dbe7f65..f2da636 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, > > struct device *dev) > > } > > EXPORT_SYMBOL_GPL(iommu_attach_device); > > > > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev, > > + struct pasid_table_info *pasidt_binfo) > > I guess that domain can always be deduced from dev using > iommu_get_domain_for_dev, and doesn't need to be passed as argument? > > For the next version of my SVM series, I was thinking of passing group > instead of device to iommu_bind. Since all devices in a group are expected > to share the same mappings (whether they want it or not), users will have Virtual address space is not tied to protection domain as I/O virtual address space does. Is it really necessary to affect all the devices in this group. Or it is just for consistence? > to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it > might be simpler to let the IOMMU core take the group lock and do > group->domain->ops->bind_task(dev...) for each device. The question also > holds for iommu_do_invalidate in patch 3/8. In my understanding, it is moving the for_each_dev loop into iommu driver? Is it? > This way the prototypes would be: > int iommu_bind...(struct iommu_group *group, struct ... *info) > int iommu_unbind...(struct iommu_group *group, struct ...*info) > int iommu_invalidate...(struct iommu_group *group, struct ...*info) For PASID table binding from guest, I think it'd better to be per-device op since the bind operation wants to modify the host context entry. But we may still share the API and do things differently in iommu driver. For invalidation, I think it'd better to be per-group. Actually, with guest IOMMU exists, there is only one group in a domain on Intel platform. Do it for each device is not expected. How about it on ARM? > For PASID table binding it might not matter much, as VFIO will most likely > be the only user. But task binding will be called by device drivers, which > by now should be encouraged to do things at iommu_group granularity. > Alternatively it could be done implicitly like in iommu_attach_device, > with "iommu_bind_device_x" calling "iommu_bind_group_x". Do you mean the bind task from userspace driver? I guess you're trying to do different types of binding request in a single svm_bind API? > > Extending this reasoning, since groups in a domain are also supposed to > have the same mappings, then similarly to map/unmap, > bind/unbind/invalidate should really be done with an iommu_domain (and > nothing else) as target argument. However this requires the IOMMU core to > keep a group list in each domain, which might complicate things a little > too much. > > But "all devices in a domain share the same PASID ta
Re: [Qemu-devel] [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest
On Thu, Apr 27, 2017 at 06:32:21PM +0800, Peter Xu wrote: > On Wed, Apr 26, 2017 at 06:06:32PM +0800, Liu, Yi L wrote: > > VT-d implementations reporting PASID or PRS fields as "Set", must also > > report ecap.ECS as "Set". Extended-Context is required for SVM. > > > > When ECS is reported, intel iommu driver would initiate extended root entry > > and extended context entry, and also PASID table if there is any SVM capable > > device. > > > > Signed-off-by: Liu, Yi L > > --- > > hw/i386/intel_iommu.c | 131 > > +++-- > > hw/i386/intel_iommu_internal.h | 9 +++ > > include/hw/i386/intel_iommu.h | 2 +- > > 3 files changed, 97 insertions(+), 45 deletions(-) > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > index 400d0d1..bf98fa5 100644 > > --- a/hw/i386/intel_iommu.c > > +++ b/hw/i386/intel_iommu.c > > @@ -497,6 +497,11 @@ static inline bool vtd_root_entry_present(VTDRootEntry > > *root) > > return root->val & VTD_ROOT_ENTRY_P; > > } > > > > +static inline bool vtd_root_entry_upper_present(VTDRootEntry *root) > > +{ > > +return root->rsvd & VTD_ROOT_ENTRY_P; > > +} > > + > > static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index, > >VTDRootEntry *re) > > { > > @@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, > > uint8_t index, > > return -VTD_FR_ROOT_TABLE_INV; > > } > > re->val = le64_to_cpu(re->val); > > +if (s->ecs) { > > +re->rsvd = le64_to_cpu(re->rsvd); > > +} > > I feel it slightly hacky to play with re->rsvd. How about: > > union VTDRootEntry { > struct { > uint64_t val; > uint64_t rsvd; > } base; > struct { > uint64_t ext_lo; > uint64_t ext_hi; > } extended; > }; Agree. > (Or any better way that can get rid of rsvd...) > > Even: > > struct VTDRootEntry { > union { > struct { > uint64_t val; > uint64_t rsvd; > } base; > struct { > uint64_t ext_lo; > uint64_t ext_hi; > } extended; > } data; > bool extended; > }; > > Then we read the entry into data, and setup extended bit. A benefit of > it is that we may avoid passing around IntelIOMMUState everywhere to > know whether we are using extended context entries. For this proposal, it's combining the s->ecs bit and root entry. But it may mislead future maintainer as it uses VTDRootEntry. maybe name it differently. > > return 0; > > } > > > > @@ -517,19 +525,30 @@ static inline bool > > vtd_context_entry_present(VTDContextEntry *context) > > return context->lo & VTD_CONTEXT_ENTRY_P; > > } > > > > -static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t > > index, > > - VTDContextEntry *ce) > > +static int vtd_get_context_entry_from_root(IntelIOMMUState *s, > > + VTDRootEntry *root, uint8_t index, VTDContextEntry *ce) > > { > > -dma_addr_t addr; > > +dma_addr_t addr, ce_size; > > > > /* we have checked that root entry is present */ > > -addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce); > > -if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) { > > +ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce)); > > +addr = (s->ecs && (index > 0x7f)) ? > > + ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) : > > + ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size); > > + > > +if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) { > > trace_vtd_re_invalid(root->rsvd, root->val); > > return -VTD_FR_CONTEXT_TABLE_INV; > > } > > -ce->lo = le64_to_cpu(ce->lo); > > -ce->hi = le64_to_cpu(ce->hi); > > + > > +ce[0].lo = le64_to_cpu(ce[0].lo); > > +ce[0].hi = le64_to_cpu(ce[0].hi); > > Again, I feel this even hackier. :) > > I would slightly prefer to play the same union trick to context > entries, just like what I proposed to the root entries above... would think about it. > > + > > +if (s->ecs) { > > +ce[1].lo = le64_to_cpu(ce[1].lo); > > +ce[1].hi = le64_
Re: [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest
On Fri, Apr 28, 2017 at 02:00:15PM +0800, Lan Tianyu wrote: > On 2017年04月27日 18:32, Peter Xu wrote: > > On Wed, Apr 26, 2017 at 06:06:32PM +0800, Liu, Yi L wrote: > >> VT-d implementations reporting PASID or PRS fields as "Set", must also > >> report ecap.ECS as "Set". Extended-Context is required for SVM. > >> > >> When ECS is reported, intel iommu driver would initiate extended root entry > >> and extended context entry, and also PASID table if there is any SVM > >> capable > >> device. > >> > >> Signed-off-by: Liu, Yi L > >> --- > >> hw/i386/intel_iommu.c | 131 > >> +++-- > >> hw/i386/intel_iommu_internal.h | 9 +++ > >> include/hw/i386/intel_iommu.h | 2 +- > >> 3 files changed, 97 insertions(+), 45 deletions(-) > >> > >> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > >> index 400d0d1..bf98fa5 100644 > >> --- a/hw/i386/intel_iommu.c > >> +++ b/hw/i386/intel_iommu.c > >> @@ -497,6 +497,11 @@ static inline bool > >> vtd_root_entry_present(VTDRootEntry *root) > >> return root->val & VTD_ROOT_ENTRY_P; > >> } > >> > >> +static inline bool vtd_root_entry_upper_present(VTDRootEntry *root) > >> +{ > >> +return root->rsvd & VTD_ROOT_ENTRY_P; > >> +} > >> + > >> static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index, > >>VTDRootEntry *re) > >> { > >> @@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, > >> uint8_t index, > >> return -VTD_FR_ROOT_TABLE_INV; > >> } > >> re->val = le64_to_cpu(re->val); > >> +if (s->ecs) { > >> +re->rsvd = le64_to_cpu(re->rsvd); > >> +} > > > > I feel it slightly hacky to play with re->rsvd. How about: > > > > union VTDRootEntry { > > struct { > > uint64_t val; > > uint64_t rsvd; > > } base; > > struct { > > uint64_t ext_lo; > > uint64_t ext_hi; > > } extended; > > }; > > > > (Or any better way that can get rid of rsvd...) > > > > Even: > > > > struct VTDRootEntry { > > union { > > struct { > > uint64_t val; > > uint64_t rsvd; > > } base; > > struct { > > uint64_t ext_lo; > > uint64_t ext_hi; > > } extended; > > } data; > > bool extended; > > }; > > > > Then we read the entry into data, and setup extended bit. A benefit of > > it is that we may avoid passing around IntelIOMMUState everywhere to > > know whether we are using extended context entries. > > > >> return 0; > >> } > >> > >> @@ -517,19 +525,30 @@ static inline bool > >> vtd_context_entry_present(VTDContextEntry *context) > >> return context->lo & VTD_CONTEXT_ENTRY_P; > >> } > >> > >> -static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t > >> index, > >> - VTDContextEntry *ce) > >> +static int vtd_get_context_entry_from_root(IntelIOMMUState *s, > >> + VTDRootEntry *root, uint8_t index, VTDContextEntry *ce) > >> { > >> -dma_addr_t addr; > >> +dma_addr_t addr, ce_size; > >> > >> /* we have checked that root entry is present */ > >> -addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce); > >> -if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) { > >> +ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce)); > >> +addr = (s->ecs && (index > 0x7f)) ? > >> + ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) > >> : > >> + ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size); > >> + > >> +if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) { > >> trace_vtd_re_invalid(root->rsvd, root->val); > >> return -VTD_FR_CONTEXT_TABLE_INV; > >> } > >> -ce->lo = le64_to_cpu(ce->lo); > >> -ce->hi = le64_to_cpu(ce->hi); > >> + > >> +ce[0].lo = le64_to_cpu(ce[0].lo); > >
Re: [Qemu-devel] [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d
On Mon, May 08, 2017 at 12:09:42PM +0800, Xiao Guangrong wrote: > > Hi Liu Yi, > > I haven't started to read the code yet, however, could you > detail more please? It emulates a SVM capable iommu device in > a VM? Or It speeds up device's DMA access in a VM? Or it is a > new facility introduced for a VM? Could you please add a bit > more for its usage? Hi Guangrong, Nice to hear from you. This patchset is part of the whole SVM virtualization work. The whole patchset wants to expose a SVM capable Intel IOMMU to guest. And yes, it is an emulated iommu. For the detail introduction for SVM and SVM virtualization, I think you may get more from the link below. http://www.spinics.net/lists/kvm/msg148798.html For the usage, I can give an example with IGD. Latest IGD is SVM capable device. On bare metal(Intel IOMMU is also SVM capable), application could request to share its virtual address(an allocated buffer) with IGD device through the IOCTL cmd provided by IGD driver. e.g. OpenCL application. When IGD is assigned to a guest, it is expected to support such usage in guest. With the SVM virtualization patchset, the application in guest would also be able to share its virtual address with IGD device. Different from bare metal, it's sharing GVA with IGD. The hardware IOMMU needs to help translate the GVA to HPA. So hardware IOMMU needs to know the GVA->HPA mapping. This patchset would make sure the GVA->HPA mapping is built and maintain the TLB. Be free to let me know if you want more detail. Thanks, Yi L > > Thanks! > > On 04/26/2017 06:11 PM, Liu, Yi L wrote: > >Hi, > > > >This patchset introduces SVM virtualization for intel_iommu in > >IOMMU/VFIO. The total SVM virtualization for intel_iommu touched > >Qemu/IOMMU/VFIO. > > > >Another patchset would change the Qemu. It is "[RFC PATCH 0/20] Qemu: > >Extend intel_iommu emulator to support Shared Virtual Memory" > > > >In this patchset, it adds two new IOMMU APIs and their implementation > >in intel_iommu driver. In VFIO, it adds two IOCTL cmd attached on > >container->fd to propagate data from QEMU to kernel space. > > > >[Patch Overview] > >* 1 adds iommu API definition for binding guest PASID table > >* 2 adds binding PASID table API implementation in VT-d iommu driver > >* 3 adds iommu API definition to do IOMMU TLB invalidation from guest > >* 4 adds IOMMU TLB invalidation implementation in VT-d iommu driver > >* 5 adds VFIO IOCTL for propagating PASID table binding from guest > >* 6 adds processing of pasid table binding in vfio_iommu_type1 > >* 7 adds VFIO IOCTL for propagating IOMMU TLB invalidation from guest > >* 8 adds processing of IOMMU TLB invalidation in vfio_iommu_type1 > > > >Best Wishes, > >Yi L > > > > > >Jacob Pan (3): > > iommu: Introduce bind_pasid_table API function > > iommu/vt-d: add bind_pasid_table function > > iommu/vt-d: Add iommu do invalidate function > > > >Liu, Yi L (5): > > iommu: Introduce iommu do invalidate API function > > VFIO: Add new IOTCL for PASID Table bind propagation > > VFIO: do pasid table binding > > VFIO: Add new IOCTL for IOMMU TLB invalidate propagation > > VFIO: do IOMMU TLB invalidation from guest > > > > drivers/iommu/intel-iommu.c | 146 > > > > drivers/iommu/iommu.c | 32 + > > drivers/vfio/vfio_iommu_type1.c | 98 +++ > > include/linux/dma_remapping.h | 1 + > > include/linux/intel-iommu.h | 11 +++ > > include/linux/iommu.h | 47 + > > include/uapi/linux/vfio.h | 26 +++ > > 7 files changed, 361 insertions(+) > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC PATCH 03/20] intel_iommu: add "svm" option
On Thu, 27 Apr 2017 18:53:17 +0800 Peter Xu wrote: > On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote: > > Expose "Shared Virtual Memory" to guest by using "svm" option. > > Also use "svm" to expose SVM related capabilities to guest. > > e.g. "-device intel-iommu, svm=on" > > > > Signed-off-by: Liu, Yi L > > --- > > hw/i386/intel_iommu.c | 10 ++ > > hw/i386/intel_iommu_internal.h | 5 + > > include/hw/i386/intel_iommu.h | 1 + > > 3 files changed, 16 insertions(+) > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index > > bf98fa5..ba1e7eb 100644 > > --- a/hw/i386/intel_iommu.c > > +++ b/hw/i386/intel_iommu.c > > @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = { > > DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false), > > DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, > FALSE), > > DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE), > > +DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE), > > DEFINE_PROP_END_OF_LIST(), > > }; > > > > @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s) > > s->ecap |= VTD_ECAP_ECS; > > } > > > > +if (s->svm) { > > +if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) { > > +error_report("Need to set ecs, pt, caching-mode for svm"); > > +exit(1); > > +} > > +s->cap |= VTD_CAP_DWD | VTD_CAP_DRD; > > +s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28; > > +} > > + > > if (s->caching_mode) { > > s->cap |= VTD_CAP_CM; > > } > > diff --git a/hw/i386/intel_iommu_internal.h > > b/hw/i386/intel_iommu_internal.h index 71a1c1e..f2a7d12 100644 > > --- a/hw/i386/intel_iommu_internal.h > > +++ b/hw/i386/intel_iommu_internal.h > > @@ -191,6 +191,9 @@ > > #define VTD_ECAP_PT (1ULL << 6) > > #define VTD_ECAP_MHMV (15ULL << 20) > > #define VTD_ECAP_ECS(1ULL << 24) > > +#define VTD_ECAP_PASID28(1ULL << 28) > > Could I ask what's this bit? On my spec, it says this bit is reserved and > defunct (spec > version: June 2016). As Ashok confirmed, yes it should be bit 40. would update it. > > +#define VTD_ECAP_PRS(1ULL << 29) > > +#define VTD_ECAP_PTS(0xeULL << 35) > > Would it better we avoid using 0xe here, or at least add some comment? For this value, it must be no more than the bits host supports. So it may be better to have a default value and meanwhile expose an option to let user set it. how about your opinion? > > > > > /* CAP_REG */ > > /* (offset >> 4) << 24 */ > > @@ -207,6 +210,8 @@ > > #define VTD_CAP_PSI (1ULL << 39) > > #define VTD_CAP_SLLPS ((1ULL << 34) | (1ULL << 35)) > > #define VTD_CAP_CM (1ULL << 7) > > +#define VTD_CAP_DWD (1ULL << 54) > > +#define VTD_CAP_DRD (1ULL << 55) > > Just to confirm: after this series, we should support drain read/write then, > right? I haven’t done special process against it in IOMMU emulator. It's set to keep consistence with VT-d spec since DWD and DRW is required capability when PASID it reported as Set. However, I think it should be fine if guest issue QI with drain read/write set in the descriptor. Host should be able to process it. Thanks, Yi L > > > > /* Supported Adjusted Guest Address Widths */ > > #define VTD_CAP_SAGAW_SHIFT 8 > > diff --git a/include/hw/i386/intel_iommu.h > > b/include/hw/i386/intel_iommu.h index ae21fe5..8981615 100644 > > --- a/include/hw/i386/intel_iommu.h > > +++ b/include/hw/i386/intel_iommu.h > > @@ -267,6 +267,7 @@ struct IntelIOMMUState { > > > > bool caching_mode; /* RO - is cap CM enabled? */ > > bool ecs; /* Extended Context Support */ > > +bool svm; /* Shared Virtual Memory */ > > > > dma_addr_t root;/* Current root table pointer */ > > bool root_extended; /* Type of root table (extended or > > not) */ > > -- > > 1.9.1 > > > > -- > Peter Xu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Qemu-devel] [RFC PATCH 03/20] intel_iommu: add "svm" option
On Mon, May 08, 2017 at 07:20:34PM +0800, Peter Xu wrote: > On Mon, May 08, 2017 at 10:38:09AM +0000, Liu, Yi L wrote: > > On Thu, 27 Apr 2017 18:53:17 +0800 > > Peter Xu wrote: > > > > > On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote: > > > > Expose "Shared Virtual Memory" to guest by using "svm" option. > > > > Also use "svm" to expose SVM related capabilities to guest. > > > > e.g. "-device intel-iommu, svm=on" > > > > > > > > Signed-off-by: Liu, Yi L > > > > --- > > > > hw/i386/intel_iommu.c | 10 ++ > > > > hw/i386/intel_iommu_internal.h | 5 + > > > > include/hw/i386/intel_iommu.h | 1 + > > > > 3 files changed, 16 insertions(+) > > > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index > > > > bf98fa5..ba1e7eb 100644 > > > > --- a/hw/i386/intel_iommu.c > > > > +++ b/hw/i386/intel_iommu.c > > > > @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = { > > > > DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false), > > > > DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, > > > FALSE), > > > > DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE), > > > > +DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE), > > > > DEFINE_PROP_END_OF_LIST(), > > > > }; > > > > > > > > @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s) > > > > s->ecap |= VTD_ECAP_ECS; > > > > } > > > > > > > > +if (s->svm) { > > > > +if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) { > > > > +error_report("Need to set ecs, pt, caching-mode for svm"); > > > > +exit(1); > > > > +} > > > > +s->cap |= VTD_CAP_DWD | VTD_CAP_DRD; > > > > +s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28; > > > > +} > > > > + > > > > if (s->caching_mode) { > > > > s->cap |= VTD_CAP_CM; > > > > } > > > > diff --git a/hw/i386/intel_iommu_internal.h > > > > b/hw/i386/intel_iommu_internal.h index 71a1c1e..f2a7d12 100644 > > > > --- a/hw/i386/intel_iommu_internal.h > > > > +++ b/hw/i386/intel_iommu_internal.h > > > > @@ -191,6 +191,9 @@ > > > > #define VTD_ECAP_PT (1ULL << 6) > > > > #define VTD_ECAP_MHMV (15ULL << 20) > > > > #define VTD_ECAP_ECS(1ULL << 24) > > > > +#define VTD_ECAP_PASID28(1ULL << 28) > > > > > > Could I ask what's this bit? On my spec, it says this bit is reserved and > > > defunct (spec > > > version: June 2016). > > > > As Ashok confirmed, yes it should be bit 40. would update it. > > Ok. > > > > > > > +#define VTD_ECAP_PRS(1ULL << 29) > > > > +#define VTD_ECAP_PTS(0xeULL << 35) > > > > > > Would it better we avoid using 0xe here, or at least add some comment? > > > > For this value, it must be no more than the bits host supports. So it may be > > better to have a default value and meanwhile expose an option to let user > > set it. how about your opinion? > > I think a more important point is that we need to make sure this value > is no larger than hardware support? Agree. If it is larger, sanity check would fail. > Since you are also working on the > vfio interface for virt-svm... would it be possible that we can talk > to kernel in some way so that we can know the supported pasid size in > host IOMMU? So that when guest specifies something bigger, we can stop > the user. If it is just to stop when the size is not valid, I think we already have such sanity check in host when trying to bind guest pasid table. Not sure if it is practical to talk with kernel on the supported pasid size. But may think about it. It is very likely that we need to do it through VFIO. > > I don't know the practical value for this field, if it's static > enough, I think it's also okay we make it static here as well. But > again, I would prefer at least some comment, like: > > /* Value N indicates PASID field of N+1 bits, here 0xe stands for.. */ yes, at least we need
Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
On Wed, Apr 26, 2017 at 06:12:02PM +0800, Liu, Yi L wrote: > From: "Liu, Yi L" Hi Alex, In this patchset, I'm trying to add two new IOCTL cmd for Shared Virtual Memory virtualization. One for binding guest PASID Table and one for iommu tlb invalidation from guest. ARM has similar requirement on SVM supporting. Since it touched VFIO, I'd like to know your comments on changes in VFIO. Thanks, Yi L > This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table > binding requests. > > On VT-d, this IOCTL cmd would be used to link the guest PASID page table > to host. While for other vendors, it may also be used to support other > kind of SVM bind request. Previously, there is a discussion on it with > ARM engineer. It can be found by the link below. This IOCTL cmd may > support SVM PASID bind request from userspace driver, or page table(cr3) > bind request from guest. These SVM bind requests would be supported by > adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support > PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to > support page table bind from guest. > > https://patchwork.kernel.org/patch/9594231/ > > Signed-off-by: Liu, Yi L > --- > include/uapi/linux/vfio.h | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 519eff3..6b97987 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap { > #define VFIO_IOMMU_ENABLE_IO(VFIO_TYPE, VFIO_BASE + 15) > #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) > > +/* IOCTL for Shared Virtual Memory Bind */ > +struct vfio_device_svm { > + __u32 argsz; > +#define VFIO_SVM_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */ > +#define VFIO_SVM_BIND_PASID (1 << 1) /* Bind PASID from userspace driver */ > +#define VFIO_SVM_BIND_PGTABLE(1 << 2) /* Bind guest mmu page table */ > + __u32 flags; > + __u32 length; > + __u8data[]; > +}; > + > +#define VFIO_SVM_TYPE_MASK (VFIO_SVM_BIND_PASIDTBL | \ > + VFIO_SVM_BIND_PASID | \ > + VFIO_SVM_BIND_PGTABLE) > + > +#define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) > + > /* Additional API for SPAPR TCE (Server POWERPC) IOMMU */ > > /* > -- > 1.9.1 > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Qemu-devel] [RFC PATCH 6/8] VFIO: do pasid table binding
On Tue, May 09, 2017 at 03:55:20PM +0800, Xiao Guangrong wrote: > > > On 04/26/2017 06:12 PM, Liu, Yi L wrote: > >From: "Liu, Yi L" > > > >This patch adds IOCTL processing in vfio_iommu_type1 for > >VFIO_IOMMU_SVM_BIND_TASK. Binds the PASID table bind by > >calling iommu_ops->bind_pasid_table to link the whole > >PASID table to pIOMMU. > > > >For VT-d, it is linking the guest PASID table to host pIOMMU. > >This is key point to support SVM virtualization on VT-d. > > > >Signed-off-by: Liu, Yi L > >--- > > drivers/vfio/vfio_iommu_type1.c | 72 > > + > > 1 file changed, 72 insertions(+) > > > >diff --git a/drivers/vfio/vfio_iommu_type1.c > >b/drivers/vfio/vfio_iommu_type1.c > >index b3cc33f..30b6d48 100644 > >--- a/drivers/vfio/vfio_iommu_type1.c > >+++ b/drivers/vfio/vfio_iommu_type1.c > >@@ -1512,6 +1512,50 @@ static int vfio_domains_have_iommu_cache(struct > >vfio_iommu *iommu) > > return ret; > > } > >+struct vfio_svm_task { > >+struct iommu_domain *domain; > >+void *payload; > >+}; > >+ > >+static int bind_pasid_tbl_fn(struct device *dev, void *data) > >+{ > >+int ret = 0; > >+struct vfio_svm_task *task = data; > >+struct pasid_table_info *pasidt_binfo; > >+ > >+pasidt_binfo = task->payload; > >+ret = iommu_bind_pasid_table(task->domain, dev, pasidt_binfo); > >+return ret; > >+} > >+ > >+static int vfio_do_svm_task(struct vfio_iommu *iommu, void *data, > >+int (*fn)(struct device *, void *)) > >+{ > >+int ret = 0; > >+struct vfio_domain *d; > >+struct vfio_group *g; > >+struct vfio_svm_task task; > >+ > >+task.payload = data; > >+ > >+mutex_lock(&iommu->lock); > >+ > >+list_for_each_entry(d, &iommu->domain_list, next) { > >+list_for_each_entry(g, &d->group_list, next) { > >+if (g->iommu_group != NULL) { > >+task.domain = d->domain; > >+ret = iommu_group_for_each_dev( > >+g->iommu_group, &task, fn); > >+if (ret != 0) > >+break; > >+} > >+} > >+} > >+ > >+mutex_unlock(&iommu->lock); > >+return ret; > >+} > >+ > > static long vfio_iommu_type1_ioctl(void *iommu_data, > >unsigned int cmd, unsigned long arg) > > { > >@@ -1582,6 +1626,34 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, > > return copy_to_user((void __user *)arg, &unmap, minsz) ? > > -EFAULT : 0; > >+} else if (cmd == VFIO_IOMMU_SVM_BIND_TASK) { > >+struct vfio_device_svm hdr; > >+u8 *data = NULL; > >+int ret = 0; > >+ > >+minsz = offsetofend(struct vfio_device_svm, length); > >+if (copy_from_user(&hdr, (void __user *)arg, minsz)) > >+return -EFAULT; > >+ > >+if (hdr.length == 0) > >+return -EINVAL; > >+ > >+data = memdup_user((void __user *)(arg + minsz), > >+hdr.length); > > You should check the @length is at least sizeof(struct pasid_table_info) as > kernel uses it as pasid_table_info, a evil application can crash kernel. Yes, thx for the remind. Thanks, Yi L ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote: > Hi Yi, > > On 26/04/17 11:12, Liu, Yi L wrote: > > From: "Liu, Yi L" > > > > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB > > invalidate request from guest to host. > > > > In the case of SVM virtualization on VT-d, host IOMMU driver has > > no knowledge of caching structure updates unless the guest > > invalidation activities are passed down to the host. So a new > > IOCTL is needed to propagate the guest cache invalidation through > > VFIO. > > > > Signed-off-by: Liu, Yi L > > --- > > include/uapi/linux/vfio.h | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > index 6b97987..50c51f8 100644 > > --- a/include/uapi/linux/vfio.h > > +++ b/include/uapi/linux/vfio.h > > @@ -564,6 +564,15 @@ struct vfio_device_svm { > > > > #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) > > > > +/* For IOMMU TLB Invalidation Propagation */ > > +struct vfio_iommu_tlb_invalidate { > > + __u32 argsz; > > + __u32 length; > > + __u8data[]; > > +}; > > We initially discussed something a little more generic than this, with > most info explicitly described and only pIOMMU-specific quirks and hints > in an opaque structure. Out of curiosity, why the change? I'm not against > a fully opaque structure, but there seem to be a large overlap between TLB > invalidations across architectures. Hi Jean, As my cover letter mentioned, it is an open on the iommu tlb invalidate propagation. Paste it here since it's in the cover letter for Qemu part changes. Pls refer to the [Open] session in the following link. http://www.spinics.net/lists/kvm/msg148798.html I want to see if community wants to have opaque structure or not on iommu tlb invalidate propagation. Personally, I incline to use opaque structure. But it's better to gather the comments before deciding it. To assist the discussion, I put the full opaque structure here. Once community gets consensus on using opaque structure for iommu tlb invalidate propagation, I'm glad to work with you on a structure with partial opaque since there seems to be overlap across arch. > > For what it's worth, when prototyping the paravirtualized IOMMU I came up > with the following. > > (From the paravirtualized POV, the SMMU also has to swizzle endianess > after unpacking an opaque structure, since userspace doesn't know what's > in it and guest might use a different endianess. So we need to force all > opaque data to be e.g. little-endian.) > > struct vfio_iommu_tlb_invalidate { > __u32 argsz; > __u32 scope; > __u32 flags; > __u32 pasid; > __u64 vaddr; > __u64 size; > __u8data[]; > }; > > Scope is a bitfields restricting the invalidation scope. By default > invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr > and @size are unused. > > Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation > scope to the pasid described by @pasid. > Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation > scope to the address range described by (@vaddr, @size). > > So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA > range for *all* pasids (as well as no_pasid). Setting scope = > (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate > the VA range only for @pasid. > Besides VA range flusing, there is PASID Cache flushing on VT-d. How about SMMU? So I think besides the two scope you defined, may need one more to indicate if it's PASID Cache flushing. > Flags depend on the selected scope: > > VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either > without scope or with INVALIDATE_VADDR) targets non-pasid mappings > exclusively (some architectures, e.g. SMMU, allow this) > > VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need > to invalidate all intermediate tables cached as part of the PTW for vaddr, > only the last-level entry (pte). This is a hint. > > I guess what's missing for Intel IOMMU and would go in @data is the > "global" hint (which we don't have in SMMU invalidations). Do you see > anything else, that the pIOMMU cannot deduce from this structure? > For Intel platform, Drain read/write would be needed in the opaque. Thanks, Yi L ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
On Fri, May 12, 2017 at 03:58:43PM -0600, Alex Williamson wrote: > On Wed, 26 Apr 2017 18:12:04 +0800 > "Liu, Yi L" wrote: > > > From: "Liu, Yi L" > > > > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB > > invalidate request from guest to host. > > > > In the case of SVM virtualization on VT-d, host IOMMU driver has > > no knowledge of caching structure updates unless the guest > > invalidation activities are passed down to the host. So a new > > IOCTL is needed to propagate the guest cache invalidation through > > VFIO. > > > > Signed-off-by: Liu, Yi L > > --- > > include/uapi/linux/vfio.h | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > index 6b97987..50c51f8 100644 > > --- a/include/uapi/linux/vfio.h > > +++ b/include/uapi/linux/vfio.h > > @@ -564,6 +564,15 @@ struct vfio_device_svm { > > > > #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) > > > > +/* For IOMMU TLB Invalidation Propagation */ > > +struct vfio_iommu_tlb_invalidate { > > + __u32 argsz; > > + __u32 length; > > + __u8data[]; > > +}; > > + > > +#define VFIO_IOMMU_TLB_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 23) > > I'm kind of wondering why this isn't just a new flag bit on > vfio_device_svm, the data structure is so similar. Of course data > needs to be fully specified in uapi. Hi Alex, For this part, it depends on using opaque structure or not. The following link mentioned it in [Open] session. http://www.spinics.net/lists/kvm/msg148798.html If we pick the full opaque solution for iommu tlb invalidate propagation. Then I may add a flag bit on vfio_device_svm and also add definition in uapi as you suggested. Thanks, Yi L > > + > > /* Additional API for SPAPR TCE (Server POWERPC) IOMMU */ > > > > /* > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
On Fri, May 12, 2017 at 03:59:14PM -0600, Alex Williamson wrote: > On Wed, 26 Apr 2017 18:11:58 +0800 > "Liu, Yi L" wrote: > > > From: Jacob Pan > > > > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use > > case in the guest: > > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html > > > > As part of the proposed architecture, when a SVM capable PCI > > device is assigned to a guest, nested mode is turned on. Guest owns the > > first level page tables (request with PASID) and performs GVA->GPA > > translation. Second level page tables are owned by the host for GPA->HPA > > translation for both request with and without PASID. > > > > A new IOMMU driver interface is therefore needed to perform tasks as > > follows: > > * Enable nested translation and appropriate translation type > > * Assign guest PASID table pointer (in GPA) and size to host IOMMU > > > > This patch introduces new functions called iommu_(un)bind_pasid_table() > > to IOMMU APIs. Architecture specific IOMMU function can be added later > > to perform the specific steps for binding pasid table of assigned devices. > > > > This patch also adds model definition in iommu.h. It would be used to > > check if the bind request is from a compatible entity. e.g. a bind > > request from an intel_iommu emulator may not be supported by an ARM SMMU > > driver. > > > > Signed-off-by: Jacob Pan > > Signed-off-by: Liu, Yi L > > --- > > drivers/iommu/iommu.c | 19 +++ > > include/linux/iommu.h | 31 +++ > > 2 files changed, 50 insertions(+) > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index dbe7f65..f2da636 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, > > struct device *dev) > > } > > EXPORT_SYMBOL_GPL(iommu_attach_device); > > > > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev, > > + struct pasid_table_info *pasidt_binfo) > > +{ > > + if (unlikely(!domain->ops->bind_pasid_table)) > > + return -EINVAL; > > + > > + return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo); > > +} > > +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table); > > + > > +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device > > *dev) > > +{ > > + if (unlikely(!domain->ops->unbind_pasid_table)) > > + return -EINVAL; > > + > > + return domain->ops->unbind_pasid_table(domain, dev); > > +} > > +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table); > > + > > static void __iommu_detach_device(struct iommu_domain *domain, > > struct device *dev) > > { > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > > index 0ff5111..491a011 100644 > > --- a/include/linux/iommu.h > > +++ b/include/linux/iommu.h > > @@ -131,6 +131,15 @@ struct iommu_dm_region { > > int prot; > > }; > > > > +struct pasid_table_info { > > + __u64 ptr;/* PASID table ptr */ > > + __u64 size; /* PASID table size*/ > > + __u32 model; /* magic number */ > > +#define INTEL_IOMMU(1 << 0) > > +#define ARM_SMMU (1 << 1) > > + __u8opaque[];/* IOMMU-specific details */ > > +}; > > This needs to be in uapi since you're expecting a user to pass it yes, it is. Thx for the correction. Thanks, Yi L > > + > > #ifdef CONFIG_IOMMU_API > > > > /** > > @@ -159,6 +168,8 @@ struct iommu_dm_region { > > * @domain_get_windows: Return the number of windows for a domain > > * @of_xlate: add OF master IDs to iommu grouping > > * @pgsize_bitmap: bitmap of all possible supported page sizes > > + * @bind_pasid_table: bind pasid table pointer for guest SVM > > + * @unbind_pasid_table: unbind pasid table pointer and restore defaults > > */ > > struct iommu_ops { > > bool (*capable)(enum iommu_cap); > > @@ -200,6 +211,10 @@ struct iommu_ops { > > u32 (*domain_get_windows)(struct iommu_domain *domain); > > > > int (*of_xlate)(struct device *dev, struct of_phandle_args *args); > > + int (*bind_pasid_table)(struct iommu_domain *domain, struct device *dev, > > + struct pasid_table_info *pasidt_binfo); > > + int (*unbind_pasid_t
Re: [RFC PATCH 3/8] iommu: Introduce iommu do invalidate API function
On Fri, May 12, 2017 at 03:59:24PM -0600, Alex Williamson wrote: > On Wed, 26 Apr 2017 18:12:00 +0800 > "Liu, Yi L" wrote: > Hi Alex, Pls refer to the open I mentioned in this email, I need your comments on it to prepare the formal patchset for SVM virtualization. Thx. > > From: "Liu, Yi L" > > > > When a SVM capable device is assigned to a guest, the first level page > > tables are owned by the guest and the guest PASID table pointer is > > linked to the device context entry of the physical IOMMU. > > > > Host IOMMU driver has no knowledge of caching structure updates unless > > the guest invalidation activities are passed down to the host. The > > primary usage is derived from emulated IOMMU in the guest, where QEMU > > can trap invalidation activities before pass them down the > > host/physical IOMMU. There are IOMMU architectural specific actions > > need to be taken which requires the generic APIs introduced in this > > patch to have opaque data in the tlb_invalidate_info argument. > > > > Signed-off-by: Liu, Yi L > > Signed-off-by: Jacob Pan > > --- > > drivers/iommu/iommu.c | 13 + > > include/linux/iommu.h | 16 > > 2 files changed, 29 insertions(+) > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index f2da636..ca7cff2 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -1153,6 +1153,19 @@ int iommu_unbind_pasid_table(struct iommu_domain > > *domain, struct device *dev) > > } > > EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table); > > > > +int iommu_do_invalidate(struct iommu_domain *domain, > > + struct device *dev, struct tlb_invalidate_info *inv_info) > > +{ > > + int ret = 0; > > + > > + if (unlikely(domain->ops->do_invalidate == NULL)) > > + return -ENODEV; > > + > > + ret = domain->ops->do_invalidate(domain, dev, inv_info); > > + return ret; > > nit, ret is unnecessary. yes, would modify it. Thx. > > +} > > +EXPORT_SYMBOL_GPL(iommu_do_invalidate); > > + > > static void __iommu_detach_device(struct iommu_domain *domain, > > struct device *dev) > > { > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > > index 491a011..a48e3b75 100644 > > --- a/include/linux/iommu.h > > +++ b/include/linux/iommu.h > > @@ -140,6 +140,11 @@ struct pasid_table_info { > > __u8opaque[];/* IOMMU-specific details */ > > }; > > > > +struct tlb_invalidate_info { > > + __u32 model; > > + __u8opaque[]; > > +}; > > I'm wondering if 'model' is really necessary here, shouldn't this > function only be called if a bind_pasid_table() succeeded, and then the > model would be set at that time? For this model, I'm thinking about another potential usage which is from Tianyu's idea to use tlb_invalidate_info to pass invalidations for iova related mappings. In such case, there would be no bind_pasid_table() before it, so a model check would be needed. But I may remove it since this patchset is focusing on SVM. Here, I have an open to check with you. I defined the tlb_invalidate_info with full opaque data. The opaque would include the invalidate info for different vendors. But we have two choices for the tlb_invalidate_info definition. a) as proposed in this patchset, passing raw data to host. Host pIOMMU driver submits invalidation request after replacing specific fields. Reject if the IOMMU model is not correct. * Pros: no need to do parse and re-assembling, better performance * Cons: unable to support the scenarios which emulates an Intel IOMMU on an ARM platform. b) parse the invalidation info into specific data, e.g. gran, addr, size, invalidation type etc. then fill the data in a generic structure. In host, pIOMMU driver re-assemble the invalidation request and submit to pIOMMU. * Pros: may be able to support the scenario above. But it is still in question since different vendor may have vendor specific invalidation info. This would make it difficult to have vendor agnostic invalidation propagation API. * Cons: needs additional complexity to do parse and re-assembling. The generic structure would be a hyper-set of all possible invalidate info, this may be hard to maintain in future. As the pros/cons show, I proposed a) as an initial version. But it is an open. Jean from ARM has gave some comments on it and inclined to the opaque way with generic part defined explicitly. Jean's reply is in the link bel
Re: [RFC PATCH 4/8] iommu/vt-d: Add iommu do invalidate function
On Fri, May 12, 2017 at 03:59:18PM -0600, Alex Williamson wrote: > On Wed, 26 Apr 2017 18:12:01 +0800 > "Liu, Yi L" wrote: > > > From: Jacob Pan > > > > This patch adds Intel VT-d specific function to implement > > iommu_do_invalidate API. > > > > The use case is for supporting caching structure invalidation > > of assigned SVM capable devices. Emulated IOMMU exposes queue > > invalidation capability and passes down all descriptors from the guest > > to the physical IOMMU. > > > > The assumption is that guest to host device ID mapping should be > > resolved prior to calling IOMMU driver. Based on the device handle, > > host IOMMU driver can replace certain fields before submit to the > > invalidation queue. > > > > Signed-off-by: Liu, Yi L > > Signed-off-by: Jacob Pan > > --- > > drivers/iommu/intel-iommu.c | 43 > > +++ > > include/linux/intel-iommu.h | 11 +++ > > 2 files changed, 54 insertions(+) > > > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > > index 6d5b939..0b098ad 100644 > > --- a/drivers/iommu/intel-iommu.c > > +++ b/drivers/iommu/intel-iommu.c > > @@ -5042,6 +5042,48 @@ static void intel_iommu_detach_device(struct > > iommu_domain *domain, > > dmar_remove_one_dev_info(to_dmar_domain(domain), dev); > > } > > > > +static int intel_iommu_do_invalidate(struct iommu_domain *domain, > > + struct device *dev, struct tlb_invalidate_info *inv_info) > > +{ > > + int ret = 0; > > + struct intel_iommu *iommu; > > + struct dmar_domain *dmar_domain = to_dmar_domain(domain); > > + struct intel_invalidate_data *inv_data; > > + struct qi_desc *qi; > > + u16 did; > > + u8 bus, devfn; > > + > > + if (!inv_info || !dmar_domain || (inv_info->model != INTEL_IOMMU)) > > + return -EINVAL; > > + > > + iommu = device_to_iommu(dev, &bus, &devfn); > > + if (!iommu) > > + return -ENODEV; > > + > > + inv_data = (struct intel_invalidate_data *)&inv_info->opaque; > > + > > + /* check SID */ > > + if (PCI_DEVID(bus, devfn) != inv_data->sid) > > + return 0; > > + > > + qi = &inv_data->inv_desc; > > + > > + switch (qi->low & QI_TYPE_MASK) { > > + case QI_DIOTLB_TYPE: > > + case QI_DEIOTLB_TYPE: > > + /* for device IOTLB, we just let it pass through */ > > + break; > > + default: > > + did = dmar_domain->iommu_did[iommu->seq_id]; > > + set_mask_bits(&qi->low, QI_DID_MASK, QI_DID(did)); > > + break; > > + } > > + > > + ret = qi_submit_sync(qi, iommu); > > + > > + return ret; > > nit, ret variable is unnecessary. yes, would remove it. > > +} > > + > > static int intel_iommu_map(struct iommu_domain *domain, > >unsigned long iova, phys_addr_t hpa, > >size_t size, int iommu_prot) > > @@ -5416,6 +5458,7 @@ static int intel_iommu_unbind_pasid_table(struct > > iommu_domain *domain, > > #ifdef CONFIG_INTEL_IOMMU_SVM > > .bind_pasid_table = intel_iommu_bind_pasid_table, > > .unbind_pasid_table = intel_iommu_unbind_pasid_table, > > + .do_invalidate = intel_iommu_do_invalidate, > > #endif > > .map= intel_iommu_map, > > .unmap = intel_iommu_unmap, > > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h > > index ac04f28..9d6562c 100644 > > --- a/include/linux/intel-iommu.h > > +++ b/include/linux/intel-iommu.h > > @@ -29,6 +29,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > > > @@ -271,6 +272,10 @@ enum { > > #define QI_PGRP_RESP_TYPE 0x9 > > #define QI_PSTRM_RESP_TYPE 0xa > > > > +#define QI_DID(did)(((u64)did & 0x) << 16) > > +#define QI_DID_MASKGENMASK(31, 16) > > +#define QI_TYPE_MASK GENMASK(3, 0) > > + > > #define QI_IEC_SELECTIVE (((u64)1) << 4) > > #define QI_IEC_IIDEX(idx) (((u64)(idx & 0x) << 32)) > > #define QI_IEC_IM(m) (((u64)(m & 0x1f) << 27)) > > @@ -529,6 +534,12 @@ struct intel_svm { > > extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev); > > #endif > > > > +struct intel_invalidate_data { > > + u16 sid; > > + u32 pasid; > > + struct qi_desc inv_desc; > > +}; > > This needs to be uapi since the vfio user is expected to create it, so > we need a uapi version of qi_desc too. > yes, would do it. Thx, Yi L > > + > > extern const struct attribute_group *intel_iommu_groups[]; > > extern void intel_iommu_debugfs_init(void); > > extern struct context_entry *iommu_context_addr(struct intel_iommu *iommu, > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
On Fri, May 12, 2017 at 03:58:51PM -0600, Alex Williamson wrote: > On Wed, 26 Apr 2017 18:12:02 +0800 > "Liu, Yi L" wrote: > > > From: "Liu, Yi L" > > > > This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table > > binding requests. > > > > On VT-d, this IOCTL cmd would be used to link the guest PASID page table > > to host. While for other vendors, it may also be used to support other > > kind of SVM bind request. Previously, there is a discussion on it with > > ARM engineer. It can be found by the link below. This IOCTL cmd may > > support SVM PASID bind request from userspace driver, or page table(cr3) > > bind request from guest. These SVM bind requests would be supported by > > adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support > > PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to > > support page table bind from guest. > > > > https://patchwork.kernel.org/patch/9594231/ > > > > Signed-off-by: Liu, Yi L > > --- > > include/uapi/linux/vfio.h | 17 + > > 1 file changed, 17 insertions(+) > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > index 519eff3..6b97987 100644 > > --- a/include/uapi/linux/vfio.h > > +++ b/include/uapi/linux/vfio.h > > @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap { > > #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15) > > #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) > > > > +/* IOCTL for Shared Virtual Memory Bind */ > > +struct vfio_device_svm { > > + __u32 argsz; > > +#define VFIO_SVM_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */ > > +#define VFIO_SVM_BIND_PASID(1 << 1) /* Bind PASID from userspace > > driver */ > > +#define VFIO_SVM_BIND_PGTABLE (1 << 2) /* Bind guest mmu page table */ > > + __u32 flags; > > + __u32 length; > > + __u8data[]; > > In the case of VFIO_SVM_BIND_PASIDTBL this is clearly struct > pasid_table_info? So at a minimum this is a union including struct > pasid_table_info. Furthermore how does a user learn what the opaque > data in struct pasid_table_info is without looking at the code? A user > API needs to be clear and documented, not opaque and variable. We > should also have references to the hardware spec for an Intel or ARM > PASID table in uapi. flags should be defined as they're used, let's > not reserve them with the expectation of future use. > Agree. would add description accordingly. For the flags, I would remove the last two as I wouldn't use. I think Jean would add them in his/her patchset. Anyhow, one of us need to do merge on the flags. Thanks, Yi L > > +}; > > + > > +#define VFIO_SVM_TYPE_MASK (VFIO_SVM_BIND_PASIDTBL | \ > > + VFIO_SVM_BIND_PASID | \ > > + VFIO_SVM_BIND_PGTABLE) > > + > > +#define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) > > + > > /* Additional API for SPAPR TCE (Server POWERPC) IOMMU */ > > > > /* > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device
Hi Alex, What's your opinion with Tianyu's question? Is it accepatable to use VFIO API in intel_iommu emulator? Thanks, Yi L On Fri, Apr 28, 2017 at 02:46:16PM +0800, Lan Tianyu wrote: > On 2017年04月26日 18:06, Liu, Yi L wrote: > > With vIOMMU exposed to guest, vIOMMU emulator needs to do translation > > between host and guest. e.g. a device-selective TLB flush, vIOMMU > > emulator needs to replace guest SID with host SID so that to limit > > the invalidation. This patch introduces a new callback > > iommu_ops->record_device() to notify vIOMMU emulator to record necessary > > information about the assigned device. > > This patch is to prepare to translate guest sbdf to host sbdf. > > Alex: > Could we add a new vfio API to do such translation? This will be more > straight forward than storing host sbdf in the vIOMMU device model. > > > > > Signed-off-by: Liu, Yi L > > --- > > include/exec/memory.h | 11 +++ > > memory.c | 12 > > 2 files changed, 23 insertions(+) > > > > diff --git a/include/exec/memory.h b/include/exec/memory.h > > index 7bd13ab..49087ef 100644 > > --- a/include/exec/memory.h > > +++ b/include/exec/memory.h > > @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps { > > IOMMUNotifierFlag new_flags); > > /* Set this up to provide customized IOMMU replay function */ > > void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier); > > +void (*record_device)(MemoryRegion *iommu, > > + void *device_info); > > }; > > > > typedef struct CoalescedMemoryRange CoalescedMemoryRange; > > @@ -708,6 +710,15 @@ void memory_region_notify_iommu(MemoryRegion *mr, > > void memory_region_notify_one(IOMMUNotifier *notifier, > >IOMMUTLBEntry *entry); > > > > +/* > > + * memory_region_notify_device_record: notify IOMMU to record assign > > + * device. > > + * @mr: the memory region to notify > > + * @ device_info: device information > > + */ > > +void memory_region_notify_device_record(MemoryRegion *mr, > > +void *info); > > + > > /** > > * memory_region_register_iommu_notifier: register a notifier for changes > > to > > * IOMMU translation entries. > > diff --git a/memory.c b/memory.c > > index 0728e62..45ef069 100644 > > --- a/memory.c > > +++ b/memory.c > > @@ -1600,6 +1600,18 @@ static void > > memory_region_update_iommu_notify_flags(MemoryRegion *mr) > > mr->iommu_notify_flags = flags; > > } > > > > +void memory_region_notify_device_record(MemoryRegion *mr, > > +void *info) > > +{ > > +assert(memory_region_is_iommu(mr)); > > + > > +if (mr->iommu_ops->record_device) { > > +mr->iommu_ops->record_device(mr, info); > > +} > > + > > +return; > > +} > > + > > void memory_region_register_iommu_notifier(MemoryRegion *mr, > > IOMMUNotifier *n) > > { > > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device
On Fri, May 19, 2017 at 09:07:49AM +, Tian, Kevin wrote: > > From: Liu, Yi L [mailto:yi.l@linux.intel.com] > > Sent: Friday, May 19, 2017 1:24 PM > > > > Hi Alex, > > > > What's your opinion with Tianyu's question? Is it accepatable > > to use VFIO API in intel_iommu emulator? > > Did you actually need such translation at all? SID should be > filled by kernel IOMMU driver based on which device is > requested with invalidation request, regardless of which > guest SID is used in user space. Qemu only needs to know > which fd corresponds to guest SID, and then initiates an > invalidation request on that fd? Kevin, It actually depends on the svm binding behavior we expect in host IOMMU driver side. If we want to have the binding per-device, this translation is needed in Qemu either in VFIO or intel_iommu emulator. So that the host SID could be used as a device selector when looping devices in a group. If we can use VFIO API directly, we also may trigger the svm bind/qi propagation straightforwardly instead of using notifier. Thanks, Yi L > > > > Thanks, > > Yi L > > On Fri, Apr 28, 2017 at 02:46:16PM +0800, Lan Tianyu wrote: > > > On 2017年04月26日 18:06, Liu, Yi L wrote: > > > > With vIOMMU exposed to guest, vIOMMU emulator needs to do > > translation > > > > between host and guest. e.g. a device-selective TLB flush, vIOMMU > > > > emulator needs to replace guest SID with host SID so that to limit > > > > the invalidation. This patch introduces a new callback > > > > iommu_ops->record_device() to notify vIOMMU emulator to record > > necessary > > > > information about the assigned device. > > > > > > This patch is to prepare to translate guest sbdf to host sbdf. > > > > > > Alex: > > > Could we add a new vfio API to do such translation? This will be more > > > straight forward than storing host sbdf in the vIOMMU device model. > > > > > > > > > > > Signed-off-by: Liu, Yi L > > > > --- > > > > include/exec/memory.h | 11 +++ > > > > memory.c | 12 > > > > 2 files changed, 23 insertions(+) > > > > > > > > diff --git a/include/exec/memory.h b/include/exec/memory.h > > > > index 7bd13ab..49087ef 100644 > > > > --- a/include/exec/memory.h > > > > +++ b/include/exec/memory.h > > > > @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps { > > > > IOMMUNotifierFlag new_flags); > > > > /* Set this up to provide customized IOMMU replay function */ > > > > void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier); > > > > +void (*record_device)(MemoryRegion *iommu, > > > > + void *device_info); > > > > }; > > > > > > > > typedef struct CoalescedMemoryRange CoalescedMemoryRange; > > > > @@ -708,6 +710,15 @@ void > > memory_region_notify_iommu(MemoryRegion *mr, > > > > void memory_region_notify_one(IOMMUNotifier *notifier, > > > >IOMMUTLBEntry *entry); > > > > > > > > +/* > > > > + * memory_region_notify_device_record: notify IOMMU to record > > assign > > > > + * device. > > > > + * @mr: the memory region to notify > > > > + * @ device_info: device information > > > > + */ > > > > +void memory_region_notify_device_record(MemoryRegion *mr, > > > > +void *info); > > > > + > > > > /** > > > > * memory_region_register_iommu_notifier: register a notifier for > > changes to > > > > * IOMMU translation entries. > > > > diff --git a/memory.c b/memory.c > > > > index 0728e62..45ef069 100644 > > > > --- a/memory.c > > > > +++ b/memory.c > > > > @@ -1600,6 +1600,18 @@ static void > > memory_region_update_iommu_notify_flags(MemoryRegion *mr) > > > > mr->iommu_notify_flags = flags; > > > > } > > > > > > > > +void memory_region_notify_device_record(MemoryRegion *mr, > > > > +void *info) > > > > +{ > > > > +assert(memory_region_is_iommu(mr)); > > > > + > > > > +if (mr->iommu_ops->record_device) { > > > > +mr->iommu_ops->record_device(mr, info); > > > > +} > > > > + > > > > +return; > > > > +} > > > > + > > > > void memory_region_register_iommu_notifier(MemoryRegion *mr, > > > > IOMMUNotifier *n) > > > > { > > > > > > > > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
On Fri, Apr 28, 2017 at 01:51:42PM +0100, Jean-Philippe Brucker wrote: > On 28/04/17 10:04, Liu, Yi L wrote: Hi Jean, Sorry for the delay response. Still have some follow-up comments on per-device or per-group. Pls refer to comments inline. > > On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote: > >> Hi Yi, Jacob, > >> > >> On 26/04/17 11:11, Liu, Yi L wrote: > >>> From: Jacob Pan > >>> > >>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use > >>> case in the guest: > >>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html > >>> > >>> As part of the proposed architecture, when a SVM capable PCI > >>> device is assigned to a guest, nested mode is turned on. Guest owns the > >>> first level page tables (request with PASID) and performs GVA->GPA > >>> translation. Second level page tables are owned by the host for GPA->HPA > >>> translation for both request with and without PASID. > >>> > >>> A new IOMMU driver interface is therefore needed to perform tasks as > >>> follows: > >>> * Enable nested translation and appropriate translation type > >>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU > >>> > >>> This patch introduces new functions called iommu_(un)bind_pasid_table() > >>> to IOMMU APIs. Architecture specific IOMMU function can be added later > >>> to perform the specific steps for binding pasid table of assigned devices. > >>> > >>> This patch also adds model definition in iommu.h. It would be used to > >>> check if the bind request is from a compatible entity. e.g. a bind > >>> request from an intel_iommu emulator may not be supported by an ARM SMMU > >>> driver. > >>> > >>> Signed-off-by: Jacob Pan > >>> Signed-off-by: Liu, Yi L > >>> --- > >>> drivers/iommu/iommu.c | 19 +++ > >>> include/linux/iommu.h | 31 +++ > >>> 2 files changed, 50 insertions(+) > >>> > >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > >>> index dbe7f65..f2da636 100644 > >>> --- a/drivers/iommu/iommu.c > >>> +++ b/drivers/iommu/iommu.c > >>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain > >>> *domain, struct device *dev) > >>> } > >>> EXPORT_SYMBOL_GPL(iommu_attach_device); > >>> > >>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device > >>> *dev, > >>> + struct pasid_table_info *pasidt_binfo) > >> > >> I guess that domain can always be deduced from dev using > >> iommu_get_domain_for_dev, and doesn't need to be passed as argument? > >> > >> For the next version of my SVM series, I was thinking of passing group > >> instead of device to iommu_bind. Since all devices in a group are expected > >> to share the same mappings (whether they want it or not), users will have > > > > Virtual address space is not tied to protection domain as I/O virtual > > address > > space does. Is it really necessary to affect all the devices in this group. > > Or it is just for consistence? > > It's mostly about consistency, and also avoid hiding implicit behavior in > the IOMMU driver. I have the following example, described using group and > domain structures from the IOMMU API: > > |IOMMU | > | |DOM __ || > | ||GRP ||| bind > | ||A<-Task 1 > | ||B ||| > | ||__||| > | | __ || > | ||GRP ||| > | ||C ||| > | ||__||| > | ||| > | | > | |DOM __ || > | ||GRP ||| > | ||D ||| > | ||__||| > | ||| > || > > Let's take PCI functions A, B, C, and D, all with PASID capabilities. Due > to some hardware limitation (in the bus, the device or the IOMMU), B can > see all DMA transactions issued by A. A and B are therefore
Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote: Hi Jean, As we've got a few discussions on it. I'd like to have a conclusion and make it as a reference for future discussion. Currently, we are inclined to have a hybrid format for the iommu tlb invalidation from userspace(vIOMMU or userspace driver). Based on the previous discussion, may the below work? 1. Add a IOCTL for iommu tlb invalidation. VFIO_IOMMU_TLB_INVALIDATE struct vfio_iommu_tlb_invalidate { __u32 argsz; __u32 length; __u8data[]; }; comments from Alex William: is it more suitable to add a new flag bit on vfio_device_svm(a structure defined in patch 5 of this patchset), the data structure is so similar. Personally, I'm ok with it. Pls let me know your thoughts. However, the precondition is we accept the whole definition in this email. If not, the vfio_iommu_tlb_invalidate would be defined differently. 2. Define a structure in include/uapi/linux/iommu.h(newly added header file) struct iommu_tlb_invalidate { __u32 scope; /* pasid-selective invalidation described by @pasid */ #define IOMMU_INVALIDATE_PASID (1 << 0) /* address-selevtive invalidation described by (@vaddr, @size) */ #define IOMMU_INVALIDATE_VADDR (1 << 1) __u32 flags; /* targets non-pasid mappings, @pasid is not valid */ #define IOMMU_INVALIDATE_NO_PASID (1 << 0) /* indicating that the pIOMMU doesn't need to invalidate all intermediate tables cached as part of the PTE for vaddr, only the last-level entry (pte). This is a hint. */ #define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1) __u32 pasid; __u64 vaddr; __u64 size; __u8data[]; }; For this part, the scope and flags are basically aligned with your previous email. I renamed the prefix to be "IOMMU_". In my opinion, the scope and flags would be filled by vIOMMU emulator and be parsed by underlying iommu driver, it is much more suitable to be defined in a uapi header file. Besides the reason above, I don't want VFIO engae too much on the data parsing. If we move the scope,flags,pasid,vaddr,size fields to vfio_iommu_tlb_invalidate, then both kernel space vfio and user space vfio needs to do much parsing. So I may prefer the way above. If you've got any other idea, pls feel free to post it. It's welcomed. Thanks, Yi L > Hi Yi, > > On 26/04/17 11:12, Liu, Yi L wrote: > > From: "Liu, Yi L" > > > > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB > > invalidate request from guest to host. > > > > In the case of SVM virtualization on VT-d, host IOMMU driver has > > no knowledge of caching structure updates unless the guest > > invalidation activities are passed down to the host. So a new > > IOCTL is needed to propagate the guest cache invalidation through > > VFIO. > > > > Signed-off-by: Liu, Yi L > > --- > > include/uapi/linux/vfio.h | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > index 6b97987..50c51f8 100644 > > --- a/include/uapi/linux/vfio.h > > +++ b/include/uapi/linux/vfio.h > > @@ -564,6 +564,15 @@ struct vfio_device_svm { > > > > #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) > > > > +/* For IOMMU TLB Invalidation Propagation */ > > +struct vfio_iommu_tlb_invalidate { > > + __u32 argsz; > > + __u32 length; > > + __u8data[]; > > +}; > > We initially discussed something a little more generic than this, with > most info explicitly described and only pIOMMU-specific quirks and hints > in an opaque structure. Out of curiosity, why the change? I'm not against > a fully opaque structure, but there seem to be a large overlap between TLB > invalidations across architectures. > > > For what it's worth, when prototyping the paravirtualized IOMMU I came up > with the following. > > (From the paravirtualized POV, the SMMU also has to swizzle endianess > after unpacking an opaque structure, since userspace doesn't know what's > in it and guest might use a different endianess. So we need to force all > opaque data to be e.g. little-endian.) > > struct vfio_iommu_tlb_invalidate { > __u32 argsz; > __u32 scope; > __u32 flags; > __u32 pasid; > __u64 vaddr; > __u64 size; > __u8data[]; > }; > > Scope is a bitfields restricting the invalidation scope. By default > invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr > and @size are unused. > > Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation >
Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
Hi Jean, On Mon, Jul 03, 2017 at 12:52:52PM +0100, Jean-Philippe Brucker wrote: > Hi Yi, > > On 02/07/17 11:06, Liu, Yi L wrote: > > On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote: > > > > Hi Jean, > > > > As we've got a few discussions on it. I'd like to have a conclusion and > > make it as a reference for future discussion. > > > > Currently, we are inclined to have a hybrid format for the iommu tlb > > invalidation from userspace(vIOMMU or userspace driver). > > > > Based on the previous discussion, may the below work? > > > > 1. Add a IOCTL for iommu tlb invalidation. > > > > VFIO_IOMMU_TLB_INVALIDATE > > > > struct vfio_iommu_tlb_invalidate { > >__u32 argsz; > >__u32 length; > > Wouldn't argsz be exactly length + 8? Might be redundant in this case. yes, it is. we may not use it in future version. but yes, if we still use it. I think we can make it easier. > >__u8data[]; > > }; > > > > comments from Alex William: is it more suitable to add a new flag bit on > > vfio_device_svm(a structure defined in patch 5 of this patchset), the data > > structure is so similar. > > > > Personally, I'm ok with it. Pls let me know your thoughts. However, the > > precondition is we accept the whole definition in this email. If not, the > > vfio_iommu_tlb_invalidate would be defined differently. > > With this proposal sharing the structure makes sense. As I understand it > we're keeping the VFIO_IOMMU_TLB_INVALIDATE ioctl? In which case adding a > flag bit would be redundant. yes, it seems to be strange if we share vfio_device_svm structure but use a separate IOCTL cmd. Maybe it's more reasonable to share IOCTL cmd and just add a new flag. Then all the svm related operations share the IOCTL. However, need to check if there would be any non-svm related iommu tlb invalidation. Then vfio_device_svm should be renamed to be non-svm specific. > > > 2. Define a structure in include/uapi/linux/iommu.h(newly added header file) > > > > struct iommu_tlb_invalidate { > > __u32 scope; > > /* pasid-selective invalidation described by @pasid */ > > #define IOMMU_INVALIDATE_PASID (1 << 0) > > /* address-selevtive invalidation described by (@vaddr, @size) */ > > #define IOMMU_INVALIDATE_VADDR (1 << 1) > > __u32 flags; > > /* targets non-pasid mappings, @pasid is not valid */ > > #define IOMMU_INVALIDATE_NO_PASID (1 << 0) > > Although it was my proposal, I don't like this flag. In ARM SMMU, we're > using a special mode where PASID 0 is reserved and any traffic without > PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag > to invalidate that special context explicitly. But this means that > invalidation packet targeted at that context will have "scope = PASID" and > "flags = NO_PASID", which is utterly confusing. > > I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID flag > and just use PASID 0 to invalidate this context on ARM. I don't think > other architectures would use the NO_PASID flag anyway, but might be mistaken. I may suggest to keep it so far. On VT-d, we may pass some data in opaque, so we may work without it. But if other vendor want to issue non-PASID tagged cache, then may encounter problem. > > /* indicating that the pIOMMU doesn't need to invalidate > >all intermediate tables cached as part of the PTE for > >vaddr, only the last-level entry (pte). This is a hint. */ > > #define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1) > > __u32 pasid; > > __u64 vaddr; > > __u64 size; > > __u8data[]; > > }; > > > > For this part, the scope and flags are basically aligned with your previous > > email. I renamed the prefix to be "IOMMU_". In my opinion, the scope and > > flags > > would be filled by vIOMMU emulator and be parsed by underlying iommu driver, > > it is much more suitable to be defined in a uapi header file. > > I tend to agree, defining a single structure in a new IOMMU UAPI file is > better than having identical structures both in uapi/linux/vfio.h and > linux/iommu.h. This way we avoid VFIO having to copy the same structure > field by field. Arch-specific structures that go in > iommu_tlb_invalidate.data also ought to be defined in uapi/linux/iommu.h yes, it is. > > Besides the reason above, I don't want VFIO engae too much on the data > > parsing. > > If we move the scope,flags,pasid,vaddr,size fields to > > vfio_i
RE: Support SVM without PASID
> -Original Message- > From: iommu-boun...@lists.linux-foundation.org [mailto:iommu- > boun...@lists.linux-foundation.org] On Behalf Of valmiki > Sent: Sunday, July 9, 2017 11:16 AM > To: Alex Williamson > Cc: Lan, Tianyu ; Tian, Kevin ; > k...@vger.kernel.org; linux-...@vger.kernel.org; > iommu@lists.linux-foundation.org; > Pan, Jacob jun > Subject: Re: Support SVM without PASID > > >> Hi, > >> > >> In SMMUv3 architecture document i see "PASIDs are optional, > >> configurable, and of a size determined by the minimum of the > >> endpoint". > >> > >> So if PASID's are optional and not supported by PCIe end point, how > >> SVM can be achieved ? > > > > It cannot be inferred from that statement that PASID support is not > > required for SVM. AIUI, SVM is a software feature enabled by numerous > > "optional" hardware features, including PASID. Features that are > > optional per the hardware specification may be required for specific > > software features. Thanks, > > > Thanks for the information Alex. Suppose if an End point doesn't support > PASID, is it > still possible to achieve SVM ? > Are there any such features in SMMUv3 with which we can achieve it ? If endpoint has no PASID support, I don't think it is SVM capable. For SMMU, maybe you can get more info from Jean. Regards, Yi L ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
Hi Alex, Against to the opaque open, I'd like to propose the following definition based on the existing comments. Pls note that I've merged the pasid table binding and iommu tlb invalidation into a single IOCTL and make different flags to indicate the iommu operations. Per Kevin's comments, there may be iommu invalidation for guest IOVA tlb, so I renamed the IOCTL and data structure to be non-svm specific. Pls kindly have a review, so that we can make the opaque open closed and move forward. Surely, comments and ideas are welcomed. And for the scope and flags definition in struct iommu_tlb_invalidate, it's also welcomed to give your ideas on it. 1. Add a VFIO IOCTL for iommu operations from user-space #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24) Corresponding data structure: struct vfio_iommu_operation_info { __u32 argsz; #define VFIO_IOMMU_BIND_PASIDTBL(1 << 0) /* Bind PASID Table */ #define VFIO_IOMMU_BIND_PASID (1 << 1) /* Bind PASID from userspace driver*/ #define VFIO_IOMMU_BIND_PGTABLE (1 << 2) /* Bind guest mmu page table */ #define VFIO_IOMMU_INVAL_IOTLB (1 << 3) /* Invalidate iommu tlb */ __u32 flag; __u32 length; // length of the data[] part in byte __u8data[]; // stores the data for iommu op indicated by flag field }; For iommu tlb invalidation from userspace, the "__u8 data[]" stores data which would be parsed by the "struct iommu_tlb_invalidate" defined below. 2. Definitions in include/uapi/linux/iommu.h(newly added header file) /* IOMMU model definition for iommu operations from userspace */ enum iommu_model { INTLE_IOMMU, ARM_SMMU, AMD_IOMMU, SPAPR_IOMMU, S390_IOMMU, }; struct iommu_tlb_invalidate { __u32 scope; /* pasid-selective invalidation described by @pasid */ #define IOMMU_INVALIDATE_PASID (1 << 0) /* address-selevtive invalidation described by (@vaddr, @size) */ #define IOMMU_INVALIDATE_VADDR (1 << 1) __u32 flags; /* targets non-pasid mappings, @pasid is not valid */ #define IOMMU_INVALIDATE_NO_PASID (1 << 0) /* indicating that the pIOMMU doesn't need to invalidate all intermediate tables cached as part of the PTE for vaddr, only the last-level entry (pte). This is a hint. */ #define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1) __u32 pasid; __u64 vaddr; __u64 size; enum iommu_model model; /* Vendor may have different HW version and thus the data part of this structure differs, use sub_version to indicate such difference. */ __u322 sub_version; __u64 length; // length of the data[] part in byte __u8data[]; }; For Intel, the data structue is: struct intel_iommu_invalidate_data { __u64 low; __u64 high; } Thanks, Yi L > -Original Message- > From: Alex Williamson [mailto:alex.william...@redhat.com] > Sent: Thursday, July 6, 2017 1:28 AM > To: Jean-Philippe Brucker > Cc: Tian, Kevin ; Liu, Yi L ; > Lan, > Tianyu ; Liu, Yi L ; Raj, Ashok > ; k...@vger.kernel.org; jasow...@redhat.com; Will Deacon > ; pet...@redhat.com; qemu-de...@nongnu.org; > iommu@lists.linux-foundation.org; Pan, Jacob jun > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB > invalidate propagation > > On Wed, 5 Jul 2017 13:42:03 +0100 > Jean-Philippe Brucker wrote: > > > On 05/07/17 07:45, Tian, Kevin wrote: > > >> From: Liu, Yi L > > >> Sent: Monday, July 3, 2017 6:31 PM > > >> > > >> Hi Jean, > > >> > > >> > > >>> > > >>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added > > >>>> header > > >> file) > > >>>> > > >>>> struct iommu_tlb_invalidate { > > >>>>__u32 scope; > > >>>> /* pasid-selective invalidation described by @pasid */ > > >>>> #define IOMMU_INVALIDATE_PASID (1 << 0) > > >>>> /* address-selevtive invalidation described by (@vaddr, @size) */ > > >>>> #define IOMMU_INVALIDATE_VADDR (1 << 1) > > > > > > For VT-d above two flags are related. There is no method of flushing > > > (@vaddr, @size) for all pasids, which doesn't make sense. address- > > > selective invalidation is valid only for a given pasid. So it's not > > > appropriate to put them in same level of scope definition at least for > > > VT-d. > > > > For ARM SMMU the "flush all by VA" operation is valid. Although it's > > unclear at this point if we will ever allow that, it should probably > &
RE: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
Hi Alex, Pls refer to the response inline. > -Original Message- > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf > Of Alex Williamson > Sent: Saturday, July 15, 2017 2:16 AM > To: Liu, Yi L > Cc: Jean-Philippe Brucker ; Tian, Kevin > ; Liu, Yi L ; Lan, Tianyu > ; Raj, Ashok ; > k...@vger.kernel.org; > jasow...@redhat.com; Will Deacon ; pet...@redhat.com; > qemu-de...@nongnu.org; iommu@lists.linux-foundation.org; Pan, Jacob jun > ; Joerg Roedel > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB > invalidate propagation > > On Fri, 14 Jul 2017 08:58:02 + > "Liu, Yi L" wrote: > > > Hi Alex, > > > > Against to the opaque open, I'd like to propose the following > > definition based on the existing comments. Pls note that I've merged > > the pasid table binding and iommu tlb invalidation into a single IOCTL > > and make different flags to indicate the iommu operations. Per Kevin's > > comments, there may be iommu invalidation for guest IOVA tlb, so I > > renamed the IOCTL and data structure to be non-svm specific. Pls > > kindly have a review, so that we can make the opaque open closed and > > move forward. Surely, comments and ideas are welcomed. And for the > > scope and flags definition in struct iommu_tlb_invalidate, it's also > > welcomed to > give your ideas on it. > > > > 1. Add a VFIO IOCTL for iommu operations from user-space > > > > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24) > > > > Corresponding data structure: > > struct vfio_iommu_operation_info { > > __u32 argsz; > > #define VFIO_IOMMU_BIND_PASIDTBL(1 << 0) /* Bind PASID Table */ > > #define VFIO_IOMMU_BIND_PASID (1 << 1) /* Bind PASID from userspace > driver*/ > > #define VFIO_IOMMU_BIND_PGTABLE (1 << 2) /* Bind guest mmu page table */ > > #define VFIO_IOMMU_INVAL_IOTLB (1 << 3) /* Invalidate iommu tlb */ > > __u32 flag; > > __u32 length; // length of the data[] part in byte > > __u8data[]; // stores the data for iommu op indicated by flag field > > }; > > If we're doing a generic "Ops" ioctl, then we should have an "op" field which > is > defined by an enum. It doesn't make sense to use flags for this, for example > can we > set multiple flag bits? If not then it's not a good use for a bit field. > I'm also not sure I > understand the value of the "length" field, can't it always be calculated > from argsz? Agreed, enum would be better. "length" field could be calculated from argsz. I used it just to avoid offset calculations. May remove it. > > For iommu tlb invalidation from userspace, the "__u8 data[]" stores > > data which would be parsed by the "struct iommu_tlb_invalidate" > > defined below. > > > > 2. Definitions in include/uapi/linux/iommu.h(newly added header file) > > > > /* IOMMU model definition for iommu operations from userspace */ enum > > iommu_model { > > INTLE_IOMMU, > > ARM_SMMU, > > AMD_IOMMU, > > SPAPR_IOMMU, > > S390_IOMMU, > > }; > > > > struct iommu_tlb_invalidate { > > __u32 scope; > > /* pasid-selective invalidation described by @pasid */ > > #define IOMMU_INVALIDATE_PASID (1 << 0) > > /* address-selevtive invalidation described by (@vaddr, @size) */ > > #define IOMMU_INVALIDATE_VADDR (1 << 1) > > Again, is a bit field appropriate here, can a user set both bits? yes, user may set both bits. It would be invalidate address range which is tagged with a PASID value. > > > __u32 flags; > > /* targets non-pasid mappings, @pasid is not valid */ > > #define IOMMU_INVALIDATE_NO_PASID (1 << 0) > > /* indicating that the pIOMMU doesn't need to invalidate > > all intermediate tables cached as part of the PTE for > > vaddr, only the last-level entry (pte). This is a hint. */ > > #define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1) > > Are we venturing into vendor specific attributes here? These two attributes are still in discussion. Jean and me synced several rounds. But lack of comments from other vendors. Personally, I think both should be generic. IOMMU_INVALIDATE_NO_PASID is to indicate no PASID used for the invalidation. IOMMU_INVALIDATE_VADDR_LEAF indicates only invalidate leaf mappings. I would see if other vendor is object on it. If yes, I'm fine to move it to vendor specific part. > > > __u32 pasid; >
Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
On Mon, Jul 17, 2017 at 04:45:15PM -0600, Alex Williamson wrote: > On Mon, 17 Jul 2017 10:58:41 + > "Liu, Yi L" wrote: > > > Hi Alex, > > > > Pls refer to the response inline. > > > > > -Original Message- > > > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On > > > Behalf > > > Of Alex Williamson > > > Sent: Saturday, July 15, 2017 2:16 AM > > > To: Liu, Yi L > > > Cc: Jean-Philippe Brucker ; Tian, Kevin > > > ; Liu, Yi L ; Lan, Tianyu > > > ; Raj, Ashok ; > > > k...@vger.kernel.org; > > > jasow...@redhat.com; Will Deacon ; pet...@redhat.com; > > > qemu-de...@nongnu.org; iommu@lists.linux-foundation.org; Pan, Jacob jun > > > ; Joerg Roedel > > > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU > > > TLB > > > invalidate propagation > > > > > > On Fri, 14 Jul 2017 08:58:02 + > > > "Liu, Yi L" wrote: > > > > > > > Hi Alex, > > > > > > > > Against to the opaque open, I'd like to propose the following > > > > definition based on the existing comments. Pls note that I've merged > > > > the pasid table binding and iommu tlb invalidation into a single IOCTL > > > > and make different flags to indicate the iommu operations. Per Kevin's > > > > comments, there may be iommu invalidation for guest IOVA tlb, so I > > > > renamed the IOCTL and data structure to be non-svm specific. Pls > > > > kindly have a review, so that we can make the opaque open closed and > > > > move forward. Surely, comments and ideas are welcomed. And for the > > > > scope and flags definition in struct iommu_tlb_invalidate, it's also > > > > welcomed to > > > give your ideas on it. > > > > > > > > 1. Add a VFIO IOCTL for iommu operations from user-space > > > > > > > > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24) > > > > > > > > Corresponding data structure: > > > > struct vfio_iommu_operation_info { > > > > __u32 argsz; > > > > #define VFIO_IOMMU_BIND_PASIDTBL(1 << 0) /* Bind PASID Table */ > > > > #define VFIO_IOMMU_BIND_PASID (1 << 1) /* Bind PASID from userspace > > > driver*/ > > > > #define VFIO_IOMMU_BIND_PGTABLE (1 << 2) /* Bind guest mmu page table */ > > > > #define VFIO_IOMMU_INVAL_IOTLB (1 << 3) /* Invalidate iommu tlb */ > > > > __u32 flag; > > > > __u32 length; // length of the data[] part in byte > > > > __u8data[]; // stores the data for iommu op indicated by > > > > flag field > > > > }; > > > > > > If we're doing a generic "Ops" ioctl, then we should have an "op" field > > > which is > > > defined by an enum. It doesn't make sense to use flags for this, for > > > example can we > > > set multiple flag bits? If not then it's not a good use for a bit field. > > > I'm also not sure I > > > understand the value of the "length" field, can't it always be calculated > > > from argsz? > > > > Agreed, enum would be better. "length" field could be calculated from > > argsz. I used > > it just to avoid offset calculations. May remove it. > > > > > > For iommu tlb invalidation from userspace, the "__u8 data[]" stores > > > > data which would be parsed by the "struct iommu_tlb_invalidate" > > > > defined below. > > > > > > > > 2. Definitions in include/uapi/linux/iommu.h(newly added header file) > > > > > > > > /* IOMMU model definition for iommu operations from userspace */ enum > > > > iommu_model { > > > > INTLE_IOMMU, > > > > ARM_SMMU, > > > > AMD_IOMMU, > > > > SPAPR_IOMMU, > > > > S390_IOMMU, > > > > }; > > > > > > > > struct iommu_tlb_invalidate { > > > > __u32 scope; > > > > /* pasid-selective invalidation described by @pasid */ > > > > #define IOMMU_INVALIDATE_PASID (1 << 0) > > > > /* address-selevtive invalidation described by (@vaddr, @size) */ > > > > #define IOMMU_INVALIDATE_VADDR (1 << 1) > > > > > > Aga
RE: [PATCH 1/1] iommu/vtd: Fix NULL pointer dereference in prq_event_thread()
Hi Baolu, > From: iommu-boun...@lists.linux-foundation.org [mailto:iommu- > boun...@lists.linux-foundation.org] On Behalf Of Lu Baolu > Sent: Monday, November 5, 2018 10:19 AM > To: Joerg Roedel ; David Woodhouse > Cc: Raj, Ashok ; linux-ker...@vger.kernel.org; > iommu@lists.linux-foundation.org > Subject: [PATCH 1/1] iommu/vtd: Fix NULL pointer dereference in > prq_event_thread() > > When handling page request without pasid event, go to "no_pasid" > branch instead of "bad_req". Otherwise, a NULL pointer deference will happen > there. > > Cc: Ashok Raj > Cc: Jacob Pan > Cc: Sohil Mehta > Signed-off-by: Lu Baolu > --- > drivers/iommu/intel-svm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index > db301efe126d..887150907526 100644 > --- a/drivers/iommu/intel-svm.c > +++ b/drivers/iommu/intel-svm.c > @@ -595,7 +595,7 @@ static irqreturn_t prq_event_thread(int irq, void *d) > pr_err("%s: Page request without PASID: %08llx > %08llx\n", > iommu->name, ((unsigned long long *)req)[0], > ((unsigned long long *)req)[1]); > - goto bad_req; > + goto no_pasid; > } > > if (!svm || svm->pasid != req->pasid) { > -- I'm afraid it is still necessary to goto "bad_req". The following code behind "bad_req" will trigger fault_cb registered by in-kernel drivers. It is reasonable that PRQ without PASID can be handled by such callbacks. So I would suggest to keep the existing logic. if (sdev && sdev->ops && sdev->ops->fault_cb) { int rwxp = (req->rd_req << 3) | (req->wr_req << 2) | (req->exe_req << 1) | (req->priv_req); sdev->ops->fault_cb(sdev->dev, req->pasid, req->addr, req->private, rwxp, result); } Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 1/1] iommu/vtd: Fix NULL pointer dereference in prq_event_thread()
> From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Monday, November 5, 2018 1:45 PM > To: Liu, Yi L ; Joerg Roedel ; David > Woodhouse > Cc: baolu...@linux.intel.com; Raj, Ashok ; linux- > ker...@vger.kernel.org; iommu@lists.linux-foundation.org > Subject: Re: [PATCH 1/1] iommu/vtd: Fix NULL pointer dereference in > prq_event_thread() > > Hi Yi, > > On 11/5/18 1:21 PM, Liu, Yi L wrote: > > Hi Baolu, > > > >> From: iommu-boun...@lists.linux-foundation.org [mailto:iommu- > >> boun...@lists.linux-foundation.org] On Behalf Of Lu Baolu > >> Sent: Monday, November 5, 2018 10:19 AM > >> To: Joerg Roedel ; David Woodhouse > >> > >> Cc: Raj, Ashok ; linux-ker...@vger.kernel.org; > >> iommu@lists.linux-foundation.org > >> Subject: [PATCH 1/1] iommu/vtd: Fix NULL pointer dereference in > >> prq_event_thread() > >> > >> When handling page request without pasid event, go to "no_pasid" > >> branch instead of "bad_req". Otherwise, a NULL pointer deference will > >> happen > there. > >> > >> Cc: Ashok Raj > >> Cc: Jacob Pan > >> Cc: Sohil Mehta > >> Signed-off-by: Lu Baolu > >> --- > >> drivers/iommu/intel-svm.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c > >> index > >> db301efe126d..887150907526 100644 > >> --- a/drivers/iommu/intel-svm.c > >> +++ b/drivers/iommu/intel-svm.c > >> @@ -595,7 +595,7 @@ static irqreturn_t prq_event_thread(int irq, void *d) > >>pr_err("%s: Page request without PASID: %08llx > >> %08llx\n", > >> iommu->name, ((unsigned long long *)req)[0], > >> ((unsigned long long *)req)[1]); > >> - goto bad_req; > >> + goto no_pasid; > >>} > >> > >>if (!svm || svm->pasid != req->pasid) { > >> -- > > > > I'm afraid it is still necessary to goto "bad_req". The following code > > behind "bad_req" will trigger fault_cb registered by in-kernel > > drivers. It is reasonable that PRQ without PASID can be handled by > > such callbacks. So I would suggest to keep the existing logic. > > A page fault without a pasid is triggered by a DMA transfer without PASID. It > doesn't > relate to the SVM functionality hence there's no @svm or @sdev related to it. > It's > unnecessary to report it to the drivers as far as I can see. Yeah, a PRQ without PASID has no corresponding svm or sdev structure. Regards to this fact, it's acceptable for this fix. In long term, it might be helpful to refine the PRQ event handler to cover PRQ without PASID. I guess Jacob's fault report framework may help. +Jacob Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
Hi Baolu, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Monday, November 5, 2018 1:32 PM [...] > --- > drivers/iommu/dmar.c| 83 +++-- > drivers/iommu/intel-svm.c | 76 -- > drivers/iommu/intel_irq_remapping.c | 6 ++- > include/linux/intel-iommu.h | 9 +++- > 4 files changed, 115 insertions(+), 59 deletions(-) > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > index d9c748b6f9e4..ec10427b98ac 100644 > --- a/drivers/iommu/dmar.c > +++ b/drivers/iommu/dmar.c > @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int > index) > int head, tail; > struct q_inval *qi = iommu->qi; > int wait_index = (index + 1) % QI_LENGTH; > + int shift = qi_shift(iommu); > > if (qi->desc_status[wait_index] == QI_ABORT) > return -EAGAIN; > @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct intel_iommu *iommu, > int index) >*/ > if (fault & DMA_FSTS_IQE) { > head = readl(iommu->reg + DMAR_IQH_REG); > - if ((head >> DMAR_IQ_SHIFT) == index) { > + if ((head >> shift) == index) { > + struct qi_desc *desc = qi->desc + head; > + > pr_err("VT-d detected invalid descriptor: " > "low=%llx, high=%llx\n", > - (unsigned long long)qi->desc[index].low, > - (unsigned long long)qi->desc[index].high); > - memcpy(&qi->desc[index], &qi->desc[wait_index], > - sizeof(struct qi_desc)); > + (unsigned long long)desc->qw0, > + (unsigned long long)desc->qw1); Still missing qw2 and qw3. May make the print differ based on if smts is configed. > + memcpy(desc, qi->desc + (wait_index << shift), Would "memcpy(desc, (unsigned long long) (qi->desc + (wait_index << shift)," be more safe? > +1 << shift); > writel(DMA_FSTS_IQE, iommu->reg + DMAR_FSTS_REG); > return -EINVAL; > } > @@ -1191,10 +1194,10 @@ static int qi_check_fault(struct intel_iommu *iommu, > int index) >*/ > if (fault & DMA_FSTS_ITE) { > head = readl(iommu->reg + DMAR_IQH_REG); > - head = ((head >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) % QI_LENGTH; > + head = ((head >> shift) - 1 + QI_LENGTH) % QI_LENGTH; > head |= 1; > tail = readl(iommu->reg + DMAR_IQT_REG); > - tail = ((tail >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) % QI_LENGTH; > + tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH; > > writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG); > > @@ -1222,15 +1225,14 @@ int qi_submit_sync(struct qi_desc *desc, struct > intel_iommu *iommu) > { > int rc; > struct q_inval *qi = iommu->qi; > - struct qi_desc *hw, wait_desc; > + int offset, shift, length; > + struct qi_desc wait_desc; > int wait_index, index; > unsigned long flags; > > if (!qi) > return 0; > > - hw = qi->desc; > - > restart: > rc = 0; > > @@ -1243,16 +1245,21 @@ int qi_submit_sync(struct qi_desc *desc, struct > intel_iommu *iommu) > > index = qi->free_head; > wait_index = (index + 1) % QI_LENGTH; > + shift = qi_shift(iommu); > + length = 1 << shift; > > qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE; > > - hw[index] = *desc; > - > - wait_desc.low = QI_IWD_STATUS_DATA(QI_DONE) | > + offset = index << shift; > + memcpy(qi->desc + offset, desc, length); > + wait_desc.qw0 = QI_IWD_STATUS_DATA(QI_DONE) | > QI_IWD_STATUS_WRITE | QI_IWD_TYPE; > - wait_desc.high = virt_to_phys(&qi->desc_status[wait_index]); > + wait_desc.qw1 = virt_to_phys(&qi->desc_status[wait_index]); > + wait_desc.qw2 = 0; > + wait_desc.qw3 = 0; > > - hw[wait_index] = wait_desc; > + offset = wait_index << shift; > + memcpy(qi->desc + offset, &wait_desc, length); same question with above one. Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 05/12] iommu/vt-d: Reserve a domain id for FL and PT modes
Hi Baolu, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Monday, November 5, 2018 1:32 PM > Subject: [PATCH v4 05/12] iommu/vt-d: Reserve a domain id for FL and PT modes > > Vt-d spec rev3.0 (section 6.2.3.1) requires that each pasid > entry for first-level or pass-through translation should be > programmed with a domain id different from those used for > second-level or nested translation. It is recommended that > software could use a same domain id for all first-only and > pass-through translations. > > This reserves a domain id for first-level and pass-through > translations. > > Cc: Ashok Raj > Cc: Jacob Pan > Cc: Kevin Tian > Cc: Liu Yi L > Cc: Sanjay Kumar > Signed-off-by: Lu Baolu > --- > drivers/iommu/intel-iommu.c | 10 ++ > drivers/iommu/intel-pasid.h | 6 ++ > 2 files changed, 16 insertions(+) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 9331240c70b8..2f7455ee4e7a 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -1618,6 +1618,16 @@ static int iommu_init_domains(struct intel_iommu > *iommu) >*/ > set_bit(0, iommu->domain_ids); > > + /* > + * Vt-d spec rev3.0 (section 6.2.3.1) requires that each pasid > + * entry for first-level or pass-through translation modes should > + * be programmed with a domain id different from those used for > + * second-level or nested translation. We reserve a domain id for > + * this purpose. > + */ > + if (sm_supported(iommu)) > + set_bit(FLPT_DEFAULT_DID, iommu->domain_ids); "FLPT_DEFAULT_DID" looks very likely for first level translation. How about "PT_FL_DEFAULT_DID"? > return 0; > } > > diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h > index 12f480c2bb8b..03c1612d173c 100644 > --- a/drivers/iommu/intel-pasid.h > +++ b/drivers/iommu/intel-pasid.h > @@ -17,6 +17,12 @@ > #define PDE_PFN_MASK PAGE_MASK > #define PASID_PDE_SHIFT 6 > > +/* > + * Domain ID reserved for pasid entries programmed for first-level > + * only and pass-through transfer modes. > + */ > +#define FLPT_DEFAULT_DID 1 Would be helpful to elaborate why DID 1 is selected in the patch description. Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 06/12] iommu/vt-d: Add second level page table interface
Hi Baolu, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Monday, November 5, 2018 1:32 PM > > This adds the interfaces to setup or tear down the structures > for second level page table translations. This includes types > of second level only translation and pass through. A little bit refining to the description:) "This patch adds interfaces for setup or tear down second level translation in PASID granularity. Translation type includes second level only type and pass-through type." > Cc: Ashok Raj > Cc: Jacob Pan > Cc: Kevin Tian > Cc: Liu Yi L > Signed-off-by: Sanjay Kumar [...] > + > +void intel_pasid_tear_down_entry(struct intel_iommu *iommu, > + struct device *dev, int pasid) > +{ > + struct pasid_entry *pte; pte is confusing as it is similar with pte in paging structures. may use pt_entry or just pasid_entry. This comment applies to other "pte"s in this patch. Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 08/12] iommu/vt-d: Pass pasid table to context mapping
Hi Baolu, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Monday, November 5, 2018 1:32 PM > Subject: [PATCH v4 08/12] iommu/vt-d: Pass pasid table to context mapping > > So that the pasid related info, such as the pasid table and the > maximum of pasid could be used during setting up scalable mode > context. A little bit refine. Wish it helps. :) "This patch passes the pasid related info(e.g. the pasid table and the maximum of pasid) to context mapping, so that pasid related fields can be setup accordingly in scalable mode context entry." Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 10/12] iommu/vt-d: Add first level page table interface
Hi Baolu, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Monday, November 5, 2018 1:32 PM > Subject: [PATCH v4 10/12] iommu/vt-d: Add first level page table interface > > This adds an interface to setup the PASID entries for first This patch adds interface to setup the PASID entries for first. :) > level page table translation. > > Cc: Ashok Raj > Cc: Jacob Pan > Cc: Kevin Tian > Cc: Liu Yi L > Signed-off-by: Sanjay Kumar > Signed-off-by: Lu Baolu > Reviewed-by: Ashok Raj > --- > drivers/iommu/intel-pasid.c | 81 + > drivers/iommu/intel-pasid.h | 11 + > include/linux/intel-iommu.h | 1 + > 3 files changed, 93 insertions(+) > > diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c > index 69530317c323..d8ca1e6a8e5e 100644 > --- a/drivers/iommu/intel-pasid.c > +++ b/drivers/iommu/intel-pasid.c > @@ -10,6 +10,7 @@ > #define pr_fmt(fmt) "DMAR: " fmt > > #include > +#include > #include > #include > #include > @@ -388,6 +389,26 @@ static inline void pasid_set_page_snoop(struct > pasid_entry > *pe, bool value) > pasid_set_bits(&pe->val[1], 1 << 23, value); > } > > +/* > + * Setup the First Level Page table Pointer field (Bit 140~191) > + * of a scalable mode PASID entry. > + */ > +static inline void > +pasid_set_flptr(struct pasid_entry *pe, u64 value) > +{ > + pasid_set_bits(&pe->val[2], VTD_PAGE_MASK, value); > +} > + > +/* > + * Setup the First Level Paging Mode field (Bit 130~131) of a > + * scalable mode PASID entry. > + */ > +static inline void > +pasid_set_flpm(struct pasid_entry *pe, u64 value) > +{ > + pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2); > +} > + > static void > pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu, > u16 did, int pasid) > @@ -458,6 +479,66 @@ void intel_pasid_tear_down_entry(struct intel_iommu > *iommu, > devtlb_invalidation_with_pasid(iommu, dev, pasid); > } > > +/* > + * Set up the scalable mode pasid table entry for first only > + * translation type. > + */ > +int intel_pasid_setup_first_level(struct intel_iommu *iommu, > + struct device *dev, pgd_t *pgd, > + int pasid, int flags) > +{ > + u16 did = FLPT_DEFAULT_DID; > + struct pasid_entry *pte; aha, same comment with previous patch. may be better us pt_entry or pasid_entry. Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 03/12] iommu/vt-d: Move page table helpers into header
Hi Baolu, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Monday, November 5, 2018 1:32 PM > Subject: [PATCH v4 03/12] iommu/vt-d: Move page table helpers into header > > So that they could also be used in other source files. Just a refine. :) "This patch moves page table helpers to header file, so that other source files can use them." Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
Hi, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Thursday, November 8, 2018 10:17 AM > Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor > support > > Hi Yi, > > On 11/7/18 2:07 PM, Liu, Yi L wrote: > > Hi Baolu, > > > >> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >> Sent: Monday, November 5, 2018 1:32 PM > > > > [...] > > > >> --- > >> drivers/iommu/dmar.c| 83 +++-- > >> drivers/iommu/intel-svm.c | 76 -- > >> drivers/iommu/intel_irq_remapping.c | 6 ++- > >> include/linux/intel-iommu.h | 9 +++- > >> 4 files changed, 115 insertions(+), 59 deletions(-) > >> > >> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index > >> d9c748b6f9e4..ec10427b98ac 100644 > >> --- a/drivers/iommu/dmar.c > >> +++ b/drivers/iommu/dmar.c > >> @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu > >> *iommu, int > >> index) > >>int head, tail; > >>struct q_inval *qi = iommu->qi; > >>int wait_index = (index + 1) % QI_LENGTH; > >> + int shift = qi_shift(iommu); > >> > >>if (qi->desc_status[wait_index] == QI_ABORT) > >>return -EAGAIN; > >> @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct intel_iommu > >> *iommu, int index) > >> */ > >>if (fault & DMA_FSTS_IQE) { > >>head = readl(iommu->reg + DMAR_IQH_REG); > >> - if ((head >> DMAR_IQ_SHIFT) == index) { > >> + if ((head >> shift) == index) { > >> + struct qi_desc *desc = qi->desc + head; > >> + > >>pr_err("VT-d detected invalid descriptor: " > >>"low=%llx, high=%llx\n", > >> - (unsigned long long)qi->desc[index].low, > >> - (unsigned long long)qi->desc[index].high); > >> - memcpy(&qi->desc[index], &qi->desc[wait_index], > >> - sizeof(struct qi_desc)); > >> + (unsigned long long)desc->qw0, > >> + (unsigned long long)desc->qw1); > > > > Still missing qw2 and qw3. May make the print differ based on if smts is > > configed. > > qw2 and qw3 are reserved from software point of view. We don't need to print > it for > information. But for Scalable mode, it should be valid? > > > > >> + memcpy(desc, qi->desc + (wait_index << shift), > > > > Would "memcpy(desc, (unsigned long long) (qi->desc + (wait_index << > > shift)," be more safe? > > Can that be compiled? memcpy() requires a "const void *" for the second > parameter. > By the way, why it's safer with this casting? This is just an example. My point is the possibility that "qi->desc + (wait_index << shift)" would be treated as "qi->desc plus (wait_index << shift)*sizeof(*qi->desc)". Is it possible for kernel build? Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 06/12] iommu/vt-d: Add second level page table interface
Hi, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Thursday, November 8, 2018 10:28 AM > Subject: Re: [PATCH v4 06/12] iommu/vt-d: Add second level page table > interface > > Hi, > > On 11/7/18 3:13 PM, Liu, Yi L wrote: > > Hi Baolu, > > > >> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >> Sent: Monday, November 5, 2018 1:32 PM > >> > >> This adds the interfaces to setup or tear down the structures for > >> second level page table translations. This includes types of second > >> level only translation and pass through. > > > > A little bit refining to the description:) "This patch adds interfaces > > for setup or tear down second level translation in PASID granularity. > > Translation type includes second level only type and pass-through > > type." > > > >> Cc: Ashok Raj > >> Cc: Jacob Pan > >> Cc: Kevin Tian > >> Cc: Liu Yi L > >> Signed-off-by: Sanjay Kumar > > > > [...] > > > >> + > >> +void intel_pasid_tear_down_entry(struct intel_iommu *iommu, > >> + struct device *dev, int pasid) > >> +{ > >> + struct pasid_entry *pte; > > > > pte is confusing as it is similar with pte in paging structures. may > > use pt_entry or just pasid_entry. This comment applies to other "pte"s > > in this patch. > > "pte" in this file means "pasid table entry", not "page table entry". > This file holds code to handle pasid table related staff. It has nothing to > do with > paging structure. I think there should be no confusion here. > :-) I see. Then up to you. :) It's just my feeling when reading the patch, it leads me to believe it is paging structure. Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
> From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Thursday, November 8, 2018 1:25 PM > Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor > support > > Hi, > > On 11/8/18 11:49 AM, Liu, Yi L wrote: > > Hi, > > > >> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >> Sent: Thursday, November 8, 2018 10:17 AM > >> Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation > >> descriptor support > >> > >> Hi Yi, > >> > >> On 11/7/18 2:07 PM, Liu, Yi L wrote: > >>> Hi Baolu, > >>> > >>>> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >>>> Sent: Monday, November 5, 2018 1:32 PM > >>> > >>> [...] > >>> > >>>> --- > >>>>drivers/iommu/dmar.c| 83 +++-- > >>>>drivers/iommu/intel-svm.c | 76 -- > >>>>drivers/iommu/intel_irq_remapping.c | 6 ++- > >>>>include/linux/intel-iommu.h | 9 +++- > >>>>4 files changed, 115 insertions(+), 59 deletions(-) > >>>> > >>>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index > >>>> d9c748b6f9e4..ec10427b98ac 100644 > >>>> --- a/drivers/iommu/dmar.c > >>>> +++ b/drivers/iommu/dmar.c > >>>> @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu > >>>> *iommu, int > >>>> index) > >>>> int head, tail; > >>>> struct q_inval *qi = iommu->qi; > >>>> int wait_index = (index + 1) % QI_LENGTH; > >>>> +int shift = qi_shift(iommu); > >>>> > >>>> if (qi->desc_status[wait_index] == QI_ABORT) > >>>> return -EAGAIN; > >>>> @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct > >>>> intel_iommu *iommu, int index) > >>>> */ > >>>> if (fault & DMA_FSTS_IQE) { > >>>> head = readl(iommu->reg + DMAR_IQH_REG); > >>>> -if ((head >> DMAR_IQ_SHIFT) == index) { > >>>> +if ((head >> shift) == index) { > >>>> +struct qi_desc *desc = qi->desc + head; > >>>> + > >>>> pr_err("VT-d detected invalid descriptor: " > >>>> "low=%llx, high=%llx\n", > >>>> -(unsigned long long)qi->desc[index].low, > >>>> -(unsigned long > >>>> long)qi->desc[index].high); > >>>> -memcpy(&qi->desc[index], &qi->desc[wait_index], > >>>> -sizeof(struct qi_desc)); > >>>> +(unsigned long long)desc->qw0, > >>>> +(unsigned long long)desc->qw1); > >>> > >>> Still missing qw2 and qw3. May make the print differ based on if smts is > >>> configed. > >> > >> qw2 and qw3 are reserved from software point of view. We don't need > >> to print it for information. > > > > But for Scalable mode, it should be valid? > > No. It's reserved for software. No, I don’t think so. PRQ response would also be queued to hardware by QI. For such QI descriptors, the high bits are not reserved. > >> > >>> > >>>> +memcpy(desc, qi->desc + (wait_index << shift), > >>> > >>> Would "memcpy(desc, (unsigned long long) (qi->desc + (wait_index << > >>> shift)," be more safe? > >> > >> Can that be compiled? memcpy() requires a "const void *" for the second > parameter. > >> By the way, why it's safer with this casting? > > > > This is just an example. My point is the possibility that "qi->desc + > > (wait_index << > shift)" > > would be treated as "qi->desc plus (wait_index << > > shift)*sizeof(*qi->desc)". Is it possible for kernel build? > > qi->desc is of type of "void *". no, I don’t think so... Refer to the code below. Even it has no correctness issue her, It's not due to qi->desc is "void *" type... struct qi_desc { - u64 low, high; + u64 qw0; + u64 qw1; + u64 qw2; + u64 qw3; }; Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
> From: Liu, Yi L > Sent: Thursday, November 8, 2018 1:45 PM > > >>>> + memcpy(desc, qi->desc + (wait_index << shift), > > >>> > > >>> Would "memcpy(desc, (unsigned long long) (qi->desc + (wait_index > > >>> << shift)," be more safe? > > >> > > >> Can that be compiled? memcpy() requires a "const void *" for the > > >> second > > parameter. > > >> By the way, why it's safer with this casting? > > > > > > This is just an example. My point is the possibility that "qi->desc > > > + (wait_index << > > shift)" > > > would be treated as "qi->desc plus (wait_index << > > > shift)*sizeof(*qi->desc)". Is it possible for kernel build? > > > > qi->desc is of type of "void *". > > no, I don’t think so... Refer to the code below. Even it has no correctness > issue her, > It's not due to qi->desc is "void *" type... > > struct qi_desc { > - u64 low, high; > + u64 qw0; > + u64 qw1; > + u64 qw2; > + u64 qw3; > }; Oops, just see you modified it to be "void *" in this patch. Ok, then this is fair enough. Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
Hi, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Thursday, November 8, 2018 2:14 PM > > Hi, > > On 11/8/18 1:45 PM, Liu, Yi L wrote: > >> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >> Sent: Thursday, November 8, 2018 1:25 PM > >> Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation > >> descriptor support > >> > >> Hi, > >> > >> On 11/8/18 11:49 AM, Liu, Yi L wrote: > >>> Hi, > >>> > >>>> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >>>> Sent: Thursday, November 8, 2018 10:17 AM > >>>> Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation > >>>> descriptor support > >>>> > >>>> Hi Yi, > >>>> > >>>> On 11/7/18 2:07 PM, Liu, Yi L wrote: > >>>>> Hi Baolu, > >>>>> > >>>>>> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >>>>>> Sent: Monday, November 5, 2018 1:32 PM > >>>>> [...] > >>>>> > >>>>>> --- > >>>>>> drivers/iommu/dmar.c| 83 > >>>>>> +++-- > >>>>>> drivers/iommu/intel-svm.c | 76 -- > >>>>>> drivers/iommu/intel_irq_remapping.c | 6 ++- > >>>>>> include/linux/intel-iommu.h | 9 +++- > >>>>>> 4 files changed, 115 insertions(+), 59 deletions(-) > >>>>>> > >>>>>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index > >>>>>> d9c748b6f9e4..ec10427b98ac 100644 > >>>>>> --- a/drivers/iommu/dmar.c > >>>>>> +++ b/drivers/iommu/dmar.c > >>>>>> @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct > >>>>>> intel_iommu *iommu, int > >>>>>> index) > >>>>>>int head, tail; > >>>>>>struct q_inval *qi = iommu->qi; > >>>>>>int wait_index = (index + 1) % QI_LENGTH; > >>>>>> + int shift = qi_shift(iommu); > >>>>>> > >>>>>>if (qi->desc_status[wait_index] == QI_ABORT) > >>>>>>return -EAGAIN; > >>>>>> @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct > >>>>>> intel_iommu *iommu, int index) > >>>>>> */ > >>>>>>if (fault & DMA_FSTS_IQE) { > >>>>>>head = readl(iommu->reg + DMAR_IQH_REG); > >>>>>> - if ((head >> DMAR_IQ_SHIFT) == index) { > >>>>>> + if ((head >> shift) == index) { > >>>>>> + struct qi_desc *desc = qi->desc + head; > >>>>>> + > >>>>>>pr_err("VT-d detected invalid descriptor: " > >>>>>>"low=%llx, high=%llx\n", > >>>>>> - (unsigned long long)qi->desc[index].low, > >>>>>> - (unsigned long > >>>>>> long)qi->desc[index].high); > >>>>>> - memcpy(&qi->desc[index], &qi->desc[wait_index], > >>>>>> - sizeof(struct qi_desc)); > >>>>>> + (unsigned long long)desc->qw0, > >>>>>> + (unsigned long long)desc->qw1); > >>>>> Still missing qw2 and qw3. May make the print differ based on if smts is > configed. > >>>> qw2 and qw3 are reserved from software point of view. We don't need > >>>> to print it for information. > >>> But for Scalable mode, it should be valid? > >> No. It's reserved for software. > > No, I don’t think so. PRQ response would also be queued to hardware by > > QI. For such QI descriptors, the high bits are not reserved. > > > > Do you mean the private data fields of a page request descriptor or a page > group > response descriptor? Those fields contains software defined private data > (might a > kernel pointer?). We should avoid leaking such information in the generic > kernel > message for security consideration. > Or anything I missed? yes, I'm not sure what kind of data it may be in the private data field. From software point of view, it may be helpful to show the full content of the QI descriptor for error triage. Personally, I'm fine if you keep it on this point. Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
Hi, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Friday, November 9, 2018 9:40 AM > Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor > support > > Hi, > > On 11/8/18 3:20 PM, Liu, Yi L wrote: > > Hi, > > > >> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >> Sent: Thursday, November 8, 2018 2:14 PM > >> > >> Hi, > >> > >> On 11/8/18 1:45 PM, Liu, Yi L wrote: > >>>> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >>>> Sent: Thursday, November 8, 2018 1:25 PM > >>>> Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation > >>>> descriptor support > >>>> > >>>> Hi, > >>>> > >>>> On 11/8/18 11:49 AM, Liu, Yi L wrote: > >>>>> Hi, > >>>>> > >>>>>> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >>>>>> Sent: Thursday, November 8, 2018 10:17 AM > >>>>>> Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit > >>>>>> invalidation descriptor support > >>>>>> > >>>>>> Hi Yi, > >>>>>> > >>>>>> On 11/7/18 2:07 PM, Liu, Yi L wrote: > >>>>>>> Hi Baolu, > >>>>>>> > >>>>>>>> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >>>>>>>> Sent: Monday, November 5, 2018 1:32 PM > >>>>>>> [...] > >>>>>>> > >>>>>>>> --- > >>>>>>>> drivers/iommu/dmar.c| 83 > >>>>>>>> +++-- > >>>>>>>> drivers/iommu/intel-svm.c | 76 > >>>>>>>> -- > >>>>>>>> drivers/iommu/intel_irq_remapping.c | 6 ++- > >>>>>>>> include/linux/intel-iommu.h | 9 +++- > >>>>>>>> 4 files changed, 115 insertions(+), 59 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index > >>>>>>>> d9c748b6f9e4..ec10427b98ac 100644 > >>>>>>>> --- a/drivers/iommu/dmar.c > >>>>>>>> +++ b/drivers/iommu/dmar.c > >>>>>>>> @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct > >>>>>>>> intel_iommu *iommu, int > >>>>>>>> index) > >>>>>>>> int head, tail; > >>>>>>>> struct q_inval *qi = iommu->qi; > >>>>>>>> int wait_index = (index + 1) % QI_LENGTH; > >>>>>>>> +int shift = qi_shift(iommu); > >>>>>>>> > >>>>>>>> if (qi->desc_status[wait_index] == QI_ABORT) > >>>>>>>> return -EAGAIN; > >>>>>>>> @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct > >>>>>>>> intel_iommu *iommu, int index) > >>>>>>>> */ > >>>>>>>> if (fault & DMA_FSTS_IQE) { > >>>>>>>> head = readl(iommu->reg + DMAR_IQH_REG); > >>>>>>>> -if ((head >> DMAR_IQ_SHIFT) == index) { > >>>>>>>> +if ((head >> shift) == index) { > >>>>>>>> +struct qi_desc *desc = qi->desc + head; > >>>>>>>> + > >>>>>>>> pr_err("VT-d detected invalid > >>>>>>>> descriptor: " > >>>>>>>> "low=%llx, high=%llx\n", > >>>>>>>> -(unsigned long long)qi->desc[index].low, > >>>>>>>> -(unsigned long > >>>>>>>> long)qi->desc[index].high); > >>>>>>>> -memcpy(&qi->desc[index], &qi->desc[wait_index], > >>>>>>>> -sizeof(struct qi_desc)); > >>>>>>>> +(unsigned long long)desc->qw0, > >>>>>>>> +(unsigned long long)desc->qw1); > >>>>>>> Still missing qw2 and qw3. May make the print differ based on if > >>>>>>> smts is > >> configed. > >>>>>> qw2 and qw3 are reserved from software point of view. We don't > >>>>>> need to print it for information. > >>>>> But for Scalable mode, it should be valid? > >>>> No. It's reserved for software. > >>> No, I don’t think so. PRQ response would also be queued to hardware > >>> by QI. For such QI descriptors, the high bits are not reserved. > >>> > >> > >> Do you mean the private data fields of a page request descriptor or a > >> page group response descriptor? Those fields contains software > >> defined private data (might a kernel pointer?). We should avoid > >> leaking such information in the generic kernel message for security > >> consideration. > >> Or anything I missed? > > > > yes, I'm not sure what kind of data it may be in the private data > > field. From software point of view, it may be helpful to show the full > > content of the QI descriptor for error triage. Personally, I'm fine if you > > keep it on > this point. > > > > Okay, thanks. > > I think I need to put some comments there so that people could understand my > consideration. yeah, that would be helpful. :-) Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC] iommu/vt-d: Group and domain relationship
Hi James, Regards to the relationship of iommu group and domain, the blog written by Alex may help you. The blog explained very well on how iommu group is determined and why. http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html > From: iommu-boun...@lists.linux-foundation.org [mailto:iommu- > boun...@lists.linux-foundation.org] On Behalf Of James Sewart via iommu > Sent: Thursday, November 8, 2018 7:30 PM > Subject: Re: [RFC] iommu/vt-d: Group and domain relationship > > Hey, > > > On 8 Nov 2018, at 01:42, Lu Baolu wrote: > > > > Hi, > > > > On 11/8/18 1:55 AM, James Sewart wrote: > >> Hey, > >>> On 7 Nov 2018, at 02:10, Lu Baolu wrote: > >>> > >>> Hi, > >>> > >>> On 11/6/18 6:40 PM, James Sewart wrote: > Hey Lu, > Would you be able to go into more detail about the issues with [...] > >>> > >>> Why do we want to open this door? Probably we want the generic iommu > >>> layer to handle these things (it's called default domain). > >> I’d like to allocate a domain and attach it to multiple devices in a > >> group/multiple groups so that they share address translation, but still > >> allow drivers for devices in those groups to use the dma_map_ops api. > > > > Just out of curiosity, why do you want to share a single domain across > > multiple groups? By default, the groups and DMA domains are normally > > 1-1 mapped, right? > > Currently we see each device in a group with their own domain. > find_or_alloc_domain looks at dma aliases to determine who shares domains > whereas pci_device_group in the generic iommu code determines groups using > a few other checks. We have observed internally that devices under a pcie > switch will be put in the same group but they do not share a domain within > that group. Really? iommu group is DMA isolation unit. You said they are not sharing a domain. Do you mean they have different IOVA address space by mentioning they are not sharing a domain? Is there any special things(e.g. special IOVA allocation to avoid unexpected P2P) done on your system? Normally, devices within an iommu group should share an IOVA address space. Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2 1/1] iommu/vtd: Cleanup dma_remapping.h header
> boun...@lists.linux-foundation.org] On Behalf Of Lu Baolu > Sent: Monday, November 12, 2018 2:40 PM > Subject: [PATCH v2 1/1] iommu/vtd: Cleanup dma_remapping.h header > > Commit e61d98d8dad00 ("x64, x2apic/intr-remap: Intel vt-d, IOMMU > code reorganization") moved dma_remapping.h from drivers/pci/ to > current place. It is entirely VT-d specific, but uses a generic > name. This merges dma_remapping.h with include/linux/intel-iommu.h > and removes dma_remapping.h as the result. > > Cc: Ashok Raj > Cc: Jacob Pan > Cc: Sohil Mehta > Suggested-by: Christoph Hellwig > Signed-off-by: Lu Baolu > Reviewed-by: Christoph Hellwig > --- Reviewed-by: Liu, Yi L Just out of curious, did you considered to modify the original file name to be an intel specific file name? What makes you believe merging the file content to intel-iommu.h is better? Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v5 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
Hi Joerg, > From: Joerg Roedel [mailto:j...@8bytes.org] > Sent: Monday, December 3, 2018 5:49 AM > To: Lu Baolu > Subject: Re: [PATCH v5 04/12] iommu/vt-d: Add 256-bit invalidation descriptor > support > > On Wed, Nov 28, 2018 at 11:54:41AM +0800, Lu Baolu wrote: > > - > > - desc_page = alloc_pages_node(iommu->node, GFP_ATOMIC | __GFP_ZERO, > 0); > > + /* > > +* Need two pages to accommodate 256 descriptors of 256 bits each > > +* if the remapping hardware supports scalable mode translation. > > +*/ > > + desc_page = alloc_pages_node(iommu->node, GFP_ATOMIC | __GFP_ZERO, > > +!!ecap_smts(iommu->ecap)); > > > Same here, does the allocation really need GFP_ATOMIC? still leave to Baolu. > > > struct q_inval { > > raw_spinlock_t q_lock; > > - struct qi_desc *desc; /* invalidation queue */ > > + void*desc; /* invalidation queue */ > > int *desc_status; /* desc status */ > > int free_head; /* first free entry */ > > int free_tail; /* last free entry */ > > Why do you switch the pointer to void* ? In this patch, there is some code like the code below. It calculates destination address of memcpy with qi->desc. If it's still struct qi_desc pointer, the calculation result would be wrong. + memcpy(desc, qi->desc + (wait_index << shift), + 1 << shift); The change of the calculation method is to support 128 bits invalidation descriptors and 256 invalidation descriptors in this unified code logic. Also, the conversation between Baolu and me may help. https://lore.kernel.org/patchwork/patch/1006756/ > > Joerg Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v5 02/12] iommu/vt-d: Manage scalalble mode PASID tables
Hi Joerg, > From: Joerg Roedel [mailto:j...@8bytes.org] > Sent: Monday, December 3, 2018 5:44 AM > To: Lu Baolu > Subject: Re: [PATCH v5 02/12] iommu/vt-d: Manage scalalble mode PASID tables > > Hi Baolu, > > On Wed, Nov 28, 2018 at 11:54:39AM +0800, Lu Baolu wrote: > > @@ -2482,12 +2482,13 @@ static struct dmar_domain > *dmar_insert_one_dev_info(struct intel_iommu *iommu, > > if (dev) > > dev->archdata.iommu = info; > > > > - if (dev && dev_is_pci(dev) && info->pasid_supported) { > > + /* PASID table is mandatory for a PCI device in scalable mode. */ > > + if (dev && dev_is_pci(dev) && sm_supported(iommu)) { > > This will also allocate a PASID table if the device does not support > PASIDs, right? Will the table not be used in that case or will the > device just use the fallback PASID? Isn't it better in that case to have > no PASID table? We need to allocate the PASID table in scalable mode, the reason is as below: In VT-d scalable mode, all address translation is done in PASID-granularity. For requests-with-PASID, the address translation would be subjected to the PASID entry specified by the PASID value in the DMA request. However, for requests-without-PASID, there is no PASID in the DMA request. To fulfil the translation logic, we've introduced RID2PASID field in sm-context-entry in VT-d 3.o spec. So that such DMA requests would be subjected to the pasid entry specified by the PASID value in the RID2PASID field of sm-context-entry. So for a device without PASID support, we need to at least to have a PASID entry so that its DMA request (without pasid) can be translated. Thus a PASID table is needed for such devices. > > > @@ -143,18 +143,20 @@ int intel_pasid_alloc_table(struct device *dev) > > return -ENOMEM; > > INIT_LIST_HEAD(&pasid_table->dev); > > > > - size = sizeof(struct pasid_entry); > > - count = min_t(int, pci_max_pasids(to_pci_dev(dev)), intel_pasid_max_id); > > - order = get_order(size * count); > > + if (info->pasid_supported) > > + max_pasid = min_t(int, pci_max_pasids(to_pci_dev(dev)), > > + intel_pasid_max_id); > > + > > + size = max_pasid >> (PASID_PDE_SHIFT - 3); > > + order = size ? get_order(size) : 0; > > pages = alloc_pages_node(info->iommu->node, > > -GFP_ATOMIC | __GFP_ZERO, > > -order); > > +GFP_ATOMIC | __GFP_ZERO, order); > > This is a simple data structure allocation path, does it need > GFP_ATOMIC? will leave it to Baolu. > > Joerg Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC PATCH 1/5] iommu: Add APIs for IOMMU PASID management
> From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Sunday, November 11, 2018 10:45 PM > Subject: [RFC PATCH 1/5] iommu: Add APIs for IOMMU PASID management > > This adds APIs for IOMMU drivers and device drivers to manage the PASIDs used > for > DMA transfer and translation. It bases on I/O ASID allocator for PASID > namespace > management and relies on vendor specific IOMMU drivers for paravirtual PASIDs. > > Below APIs are added: > > * iommu_pasid_init(pasid) > - Initialize a PASID consumer. The vendor specific IOMMU > drivers are able to set the PASID range imposed by IOMMU > hardware through a callback in iommu_ops. > > * iommu_pasid_exit(pasid) > - The PASID consumer stops consuming any PASID. > > * iommu_pasid_alloc(pasid, min, max, private, *ioasid) > - Allocate a PASID and associate a @private data with this > PASID. The PASID value is stored in @ioaisd if returning > success. > > * iommu_pasid_free(pasid, ioasid) > - Free a PASID to the pool so that it could be consumed by > others. > > This also adds below helpers to lookup or iterate PASID items associated with > a > consumer. > > * iommu_pasid_for_each(pasid, func, data) > - Iterate PASID items of the consumer identified by @pasid, > and call @func() against each item. An error returned from > @func() will break the iteration. > > * iommu_pasid_find(pasid, ioasid) > - Retrieve the private data associated with @ioasid. > > Cc: Ashok Raj > Cc: Jacob Pan > Cc: Kevin Tian > Cc: Jean-Philippe Brucker > Signed-off-by: Lu Baolu > --- > drivers/iommu/Kconfig | 1 + > drivers/iommu/iommu.c | 89 +++ > include/linux/iommu.h | 73 +++ > 3 files changed, 163 insertions(+) > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index > d9a25715650e..39f2bb76c7b8 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -1,6 +1,7 @@ > # IOMMU_API always gets selected by whoever wants it. > config IOMMU_API > bool > + select IOASID > > menuconfig IOMMU_SUPPORT > bool "IOMMU Hardware Support" > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index > 0b7c96d1425e..570b244897bb 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -2082,3 +2082,92 @@ void iommu_detach_device_aux(struct iommu_domain > *domain, struct device *dev) > } > } > EXPORT_SYMBOL_GPL(iommu_detach_device_aux); > + > +/* > + * APIs for PASID used by IOMMU and the device drivers which depend > + * on IOMMU. > + */ > +struct iommu_pasid *iommu_pasid_init(struct bus_type *bus) { I'm thinking about if using struct iommu_domain here is better than struct bus_type. The major purpose is to pass iommu_ops in it and route into iommu-sublayer. iommu_domain may be better since some modules like vfio_iommu_type1 would use iommu_domain more than bus type. Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v6 0/9] vfio/mdev: IOMMU aware mediated device
> From: Alex Williamson [mailto:alex.william...@redhat.com] > Sent: Friday, February 15, 2019 4:15 AM > To: Lu Baolu > Subject: Re: [PATCH v6 0/9] vfio/mdev: IOMMU aware mediated device > > On Wed, 13 Feb 2019 12:02:52 +0800 > Lu Baolu wrote: > > > Hi, > > > > The Mediate Device is a framework for fine-grained physical device > > sharing across the isolated domains. Currently the mdev framework is > > designed to be independent of the platform IOMMU support. As the > > result, the DMA isolation relies on the mdev parent device in a vendor > > specific way. > > > > There are several cases where a mediated device could be protected and > > isolated by the platform IOMMU. For example, Intel vt-d rev3.0 [1] > > introduces a new translation mode called 'scalable mode', which > > enables PASID-granular translations. The vt-d scalable mode is the key > > ingredient for Scalable I/O Virtualization [2] [3] which allows > > sharing a device in minimal possible granularity (ADI - Assignable > > Device Interface). > > > > A mediated device backed by an ADI could be protected and isolated by > > the IOMMU since 1) the parent device supports tagging an unique PASID > > to all DMA traffic out of the mediated device; and 2) the DMA > > translation unit (IOMMU) supports the PASID granular translation. > > We can apply IOMMU protection and isolation to this kind of devices > > just as what we are doing with an assignable PCI device. > > > > In order to distinguish the IOMMU-capable mediated devices from those > > which still need to rely on parent devices, this patch set adds one > > new member in struct mdev_device. > > > > * iommu_device > > - This, if set, indicates that the mediated device could > > be fully isolated and protected by IOMMU via attaching > > an iommu domain to this device. If empty, it indicates > > using vendor defined isolation. > > > > Below helpers are added to set and get above iommu device in mdev core > > implementation. > > > > * mdev_set/get_iommu_device(dev, iommu_device) > > - Set or get the iommu device which represents this mdev > > in IOMMU's device scope. Drivers don't need to set the > > iommu device if it uses vendor defined isolation. > > > > The mdev parent device driver could opt-in that the mdev could be > > fully isolated and protected by the IOMMU when the mdev is being > > created by invoking mdev_set_iommu_device() in its @create(). > > > > In the vfio_iommu_type1_attach_group(), a domain allocated through > > iommu_domain_alloc() will be attached to the mdev iommu device if an > > iommu device has been set. Otherwise, the dummy external domain will > > be used and all the DMA isolation and protection are routed to parent > > driver as the result. > > > > On IOMMU side, a basic requirement is allowing to attach multiple > > domains to a PCI device if the device advertises the capability and > > the IOMMU hardware supports finer granularity translations than the > > normal PCI Source ID based translation. > > > > As the result, a PCI device could work in two modes: normal mode and > > auxiliary mode. In the normal mode, a pci device could be isolated in > > the Source ID granularity; the pci device itself could be assigned to > > a user application by attaching a single domain to it. In the > > auxiliary mode, a pci device could be isolated in finer granularity, > > hence subsets of the device could be assigned to different user level > > application by attaching a different domain to each subset. > > > > Below APIs are introduced in iommu generic layer for aux-domain > > purpose: > > > > * iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX) > > - Check whether both IOMMU and device support IOMMU aux > > domain feature. Below aux-domain specific interfaces > > are available only after this returns true. > > > > * iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX) > > - Enable/disable device specific aux-domain feature. > > > > * iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX) > > - Check whether the aux domain specific feature enabled or > > not. > > > > * iommu_aux_attach_device(domain, dev) > > - Attaches @domain to @dev in the auxiliary mode. Multiple > > domains could be attached to a single device in the > > auxiliary mode with each domain representing an isolated > > address space for an assignable subset of the device. > > > > * iommu_aux_detach_device(domain, dev) > > - Detach @domain which has been attached to @dev in the > > auxiliary mode. > > > > * iommu_aux_get_pasid(domain, dev) > > - Return ID used for finer-granularity DMA translation. > > For the Intel Scalable IOV usage model, this will be > > a PASID. The device which supports Scalable IOV needs > > to write this ID to the device register so that DMA > > requests could be tagged with a right PASID prefix. > > > > In order for the ease of discussion, sometimes we call "a domain in > > auxiliary mode' or simply 'an auxiliary d
RE: [RFC v3 1/8] vfio: Add VFIO_IOMMU_PASID_REQUEST(alloc/free)
> From: Liu, Yi L > Sent: Friday, January 31, 2020 8:41 PM > To: Alex Williamson > Subject: RE: [RFC v3 1/8] vfio: Add VFIO_IOMMU_PASID_REQUEST(alloc/free) > > > +static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu, > > > +unsigned int pasid) > > > +{ > > > + struct vfio_mm *vmm = iommu->vmm; > > > + int ret = 0; > > > + > > > + mutex_lock(&iommu->lock); > > > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { > > > > But we could have been IOMMU backed when the pasid was allocated, did we > just > > leak something? In fact, I didn't spot anything in this series that handles > > a container with pasids allocated losing iommu backing. > > I'd think we want to release all pasids when that happens since permission > > for > > the user to hold pasids goes along with having an iommu backed device. > > oh, yes. If a container lose iommu backend, then needs to reclaim the > allocated > PASIDs. right? I'll add it. :-) Hi Alex, I went through the flow again. Maybe current series has already covered it. There is vfio_mm which is used to track allocated PASIDs. Its life cycle is type1 driver open and release. If I understand it correctly, type1 driver release happens when there is no more iommu backed groups in a container. static void __vfio_group_unset_container(struct vfio_group *group) { [...] /* Detaching the last group deprivileges a container, remove iommu */ if (driver && list_empty(&container->group_list)) { driver->ops->release(container->iommu_data); module_put(driver->ops->owner); container->iommu_driver = NULL; container->iommu_data = NULL; } [...] } Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH V9 06/10] iommu/vt-d: Add svm/sva invalidate function
> From: Jacob Pan > Sent: Wednesday, January 29, 2020 2:02 PM > Subject: [PATCH V9 06/10] iommu/vt-d: Add svm/sva invalidate function > > When Shared Virtual Address (SVA) is enabled for a guest OS via vIOMMU, we > need to provide invalidation support at IOMMU API and driver level. This patch > adds Intel VT-d specific function to implement iommu passdown invalidate API > for shared virtual address. > > The use case is for supporting caching structure invalidation of assigned SVM > capable devices. Emulated IOMMU exposes queue invalidation capability and > passes down all descriptors from the guest to the physical IOMMU. > > The assumption is that guest to host device ID mapping should be resolved > prior > to calling IOMMU driver. Based on the device handle, host IOMMU driver can > replace certain fields before submit to the invalidation queue. > > Signed-off-by: Jacob Pan > Signed-off-by: Ashok Raj > Signed-off-by: Liu, Yi L May be update my email to Liu Yi L :-) @linux.intel.com account no more work for me. :-( > --- > drivers/iommu/intel-iommu.c | 173 > > 1 file changed, 173 insertions(+) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index > 8a4136e805ac..b8aa6479b87f 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -5605,6 +5605,178 @@ static void intel_iommu_aux_detach_device(struct > iommu_domain *domain, > aux_domain_remove_dev(to_dmar_domain(domain), dev); } > > +/* > + * 2D array for converting and sanitizing IOMMU generic TLB granularity > +to > + * VT-d granularity. Invalidation is typically included in the unmap > +operation > + * as a result of DMA or VFIO unmap. However, for assigned device where > +guest > + * could own the first level page tables without being shadowed by > +QEMU. In > + * this case there is no pass down unmap to the host IOMMU as a result > +of unmap > + * in the guest. Only invalidations are trapped and passed down. > + * In all cases, only first level TLB invalidation (request with PASID) > +can be > + * passed down, therefore we do not include IOTLB granularity for > +request > + * without PASID (second level). > + * > + * For an example, to find the VT-d granularity encoding for IOTLB > + * type and page selective granularity within PASID: > + * X: indexed by iommu cache type > + * Y: indexed by enum iommu_inv_granularity > + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR] > + * > + * Granu_map array indicates validity of the table. 1: valid, 0: > +invalid > + * > + */ > +const static int > inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_N > R] = { > + /* PASID based IOTLB, support PASID selective and page selective */ > + {0, 1, 1}, > + /* PASID based dev TLBs, only support all PASIDs or single PASID */ > + {1, 1, 0}, > + /* PASID cache */ > + {1, 1, 0} > +}; > + > +const static u64 > inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_ > NR] = { > + /* PASID based IOTLB */ > + {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID}, > + /* PASID based dev TLBs */ > + {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0}, > + /* PASID cache */ > + {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0}, }; > + > +static inline int to_vtd_granularity(int type, int granu, u64 > +*vtd_granu) { > + if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >= > IOMMU_INV_GRANU_NR || > + !inv_type_granu_map[type][granu]) > + return -EINVAL; > + > + *vtd_granu = inv_type_granu_table[type][granu]; > + > + return 0; > +} > + > +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules) { > + u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT; > + > + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc. > + * IOMMU cache invalidate API passes granu_size in bytes, and number > of > + * granu size in contiguous memory. > + */ > + return order_base_2(nr_pages); > +} > + > +#ifdef CONFIG_INTEL_IOMMU_SVM > +static int intel_iommu_sva_invalidate(struct iommu_domain *domain, > + struct device *dev, struct iommu_cache_invalidate_info > *inv_info) { > + struct dmar_domain *dmar_domain = to_dmar_domain(domain); > + struct device_domain_info *info; > + struct intel_iommu *iommu; > + unsigned long flags; > + int cache_type; > + u8 bus, devfn; > + u16 did, sid; > + int ret = 0; > + u64 size; > + > + if (!inv_info || !dmar_domain || > + inv_info->version != > IOMMU_CACHE_INVALIDATE_INFO_VE
[PATCH v1 7/8] vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE
From: Liu Yi L For VFIO IOMMUs with the type VFIO_TYPE1_NESTING_IOMMU, guest "owns" the first-level/stage-1 translation structures, the host IOMMU driver has no knowledge of first-level/stage-1 structure cache updates unless the guest invalidation requests are trapped and propagated to the host. This patch adds a new IOCTL VFIO_IOMMU_CACHE_INVALIDATE to propagate guest first-level/stage-1 IOMMU cache invalidations to host to ensure IOMMU cache correctness. With this patch, vSVA (Virtual Shared Virtual Addressing) can be used safely as the host IOMMU iotlb correctness are ensured. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L Signed-off-by: Eric Auger Signed-off-by: Jacob Pan --- drivers/vfio/vfio_iommu_type1.c | 49 + include/uapi/linux/vfio.h | 22 ++ 2 files changed, 71 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index a877747..937ec3f 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2423,6 +2423,15 @@ static long vfio_iommu_type1_unbind_gpasid(struct vfio_iommu *iommu, return ret; } +static int vfio_cache_inv_fn(struct device *dev, void *data) +{ + struct domain_capsule *dc = (struct domain_capsule *)data; + struct iommu_cache_invalidate_info *cache_inv_info = + (struct iommu_cache_invalidate_info *) dc->data; + + return iommu_cache_invalidate(dc->domain, dev, cache_inv_info); +} + static long vfio_iommu_type1_ioctl(void *iommu_data, unsigned int cmd, unsigned long arg) { @@ -2629,6 +2638,46 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, } kfree(gbind_data); return ret; + } else if (cmd == VFIO_IOMMU_CACHE_INVALIDATE) { + struct vfio_iommu_type1_cache_invalidate cache_inv; + u32 version; + int info_size; + void *cache_info; + int ret; + + minsz = offsetofend(struct vfio_iommu_type1_cache_invalidate, + flags); + + if (copy_from_user(&cache_inv, (void __user *)arg, minsz)) + return -EFAULT; + + if (cache_inv.argsz < minsz || cache_inv.flags) + return -EINVAL; + + /* Get the version of struct iommu_cache_invalidate_info */ + if (copy_from_user(&version, + (void __user *) (arg + minsz), sizeof(version))) + return -EFAULT; + + info_size = iommu_uapi_get_data_size( + IOMMU_UAPI_CACHE_INVAL, version); + + cache_info = kzalloc(info_size, GFP_KERNEL); + if (!cache_info) + return -ENOMEM; + + if (copy_from_user(cache_info, + (void __user *) (arg + minsz), info_size)) { + kfree(cache_info); + return -EFAULT; + } + + mutex_lock(&iommu->lock); + ret = vfio_iommu_for_each_dev(iommu, vfio_cache_inv_fn, + cache_info); + mutex_unlock(&iommu->lock); + kfree(cache_info); + return ret; } return -ENOTTY; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 2235bc6..62ca791 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -899,6 +899,28 @@ struct vfio_iommu_type1_bind { */ #define VFIO_IOMMU_BIND_IO(VFIO_TYPE, VFIO_BASE + 23) +/** + * VFIO_IOMMU_CACHE_INVALIDATE - _IOW(VFIO_TYPE, VFIO_BASE + 24, + * struct vfio_iommu_type1_cache_invalidate) + * + * Propagate guest IOMMU cache invalidation to the host. The cache + * invalidation information is conveyed by @cache_info, the content + * format would be structures defined in uapi/linux/iommu.h. User + * should be aware of that the struct iommu_cache_invalidate_info + * has a @version field, vfio needs to parse this field before getting + * data from userspace. + * + * Availability of this IOCTL is after VFIO_SET_IOMMU. + * + * returns: 0 on success, -errno on failure. + */ +struct vfio_iommu_type1_cache_invalidate { + __u32 argsz; + __u32 flags; + struct iommu_cache_invalidate_info cache_info; +}; +#define VFIO_IOMMU_CACHE_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 24) + /* Additional API for SPAPR TCE (Server POWERPC) IOMMU */ /* -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v1 4/8] vfio: Check nesting iommu uAPI version
From: Liu Yi L In Linux Kernel, the IOMMU nesting translation (a.k.a dual stage address translation) capability is abstracted in uapi/iommu.h, in which the uAPIs like bind_gpasid/iommu_cache_invalidate/fault_report/pgreq_resp are defined. VFIO_TYPE1_NESTING_IOMMU stands for the vfio iommu type which is backed by hardware IOMMU w/ dual stage translation capability. For such vfio iommu type, userspace is able to setup dual stage DMA translation in host side via VFIO's ABI. However, such VFIO ABIs rely on the uAPIs defined in uapi/ iommu.h. So VFIO needs to provide an API to userspace for the uapi/iommu.h version check to ensure the iommu uAPI compatibility. This patch reports the iommu uAPI version to userspace in VFIO_CHECK_EXTENSION IOCTL. Applications could do version check before further setup dual stage translation in host IOMMU. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L --- drivers/vfio/vfio_iommu_type1.c | 2 ++ include/uapi/linux/vfio.h | 9 + 2 files changed, 11 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index ddd1ffe..9aa2a67 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2274,6 +2274,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, if (!iommu) return 0; return vfio_domains_have_iommu_cache(iommu); + case VFIO_NESTING_IOMMU_UAPI: + return iommu_get_uapi_version(); default: return 0; } diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 8837219..ed9881d 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -47,6 +47,15 @@ #define VFIO_NOIOMMU_IOMMU 8 /* + * Hardware IOMMUs with two-stage translation capability give userspace + * the ownership of stage-1 translation structures (e.g. page tables). + * VFIO exposes the two-stage IOMMU programming capability to userspace + * based on the IOMMU UAPIs. Therefore user of VFIO_TYPE1_NESTING should + * check the IOMMU UAPI version compatibility. + */ +#define VFIO_NESTING_IOMMU_UAPI9 + +/* * The IOCTL interface is designed for extensibility by embedding the * structure length (argsz) and flags into structures passed between * kernel and userspace. We therefore use the _IO() macro for these -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace
From: Liu Yi L VFIO exposes IOMMU nesting translation (a.k.a dual stage translation) capability to userspace. Thus applications like QEMU could support vIOMMU with hardware's nesting translation capability for pass-through devices. Before setting up nesting translation for pass-through devices, QEMU and other applications need to learn the supported 1st-lvl/stage-1 translation structure format like page table format. Take vSVA (virtual Shared Virtual Addressing) as an example, to support vSVA for pass-through devices, QEMU setup nesting translation for pass- through devices. The guest page table are configured to host as 1st-lvl/ stage-1 page table. Therefore, guest format should be compatible with host side. This patch reports the supported 1st-lvl/stage-1 page table format on the current platform to userspace. QEMU and other alike applications should use this format info when trying to setup IOMMU nesting translation on host IOMMU. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L --- drivers/vfio/vfio_iommu_type1.c | 56 + include/uapi/linux/vfio.h | 1 + 2 files changed, 57 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 9aa2a67..82a9e0b 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu, return ret; } +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu, +u32 *stage1_format) +{ + struct vfio_domain *domain; + u32 format = 0, tmp_format = 0; + int ret; + + mutex_lock(&iommu->lock); + if (list_empty(&iommu->domain_list)) { + mutex_unlock(&iommu->lock); + return -EINVAL; + } + + list_for_each_entry(domain, &iommu->domain_list, next) { + if (iommu_domain_get_attr(domain->domain, + DOMAIN_ATTR_PASID_FORMAT, &format)) { + ret = -EINVAL; + format = 0; + goto out_unlock; + } + /* +* format is always non-zero (the first format is +* IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For +* the reason of potential different backed IOMMU +* formats, here we expect to have identical formats +* in the domain list, no mixed formats support. +* return -EINVAL to fail the attempt of setup +* VFIO_TYPE1_NESTING_IOMMU if non-identical formats +* are detected. +*/ + if (tmp_format && tmp_format != format) { + ret = -EINVAL; + format = 0; + goto out_unlock; + } + + tmp_format = format; + } + ret = 0; + +out_unlock: + if (format) + *stage1_format = format; + mutex_unlock(&iommu->lock); + return ret; +} + static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, struct vfio_info_cap *caps) { struct vfio_info_cap_header *header; struct vfio_iommu_type1_info_cap_nesting *nesting_cap; + u32 formats = 0; + int ret; + + ret = vfio_iommu_get_stage1_format(iommu, &formats); + if (ret) { + pr_warn("Failed to get stage-1 format\n"); + return ret; + } header = vfio_info_cap_add(caps, sizeof(*nesting_cap), VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, /* nesting iommu type supports PASID requests (alloc/free) */ nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; } + nesting_cap->stage1_formats = formats; return 0; } diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index ed9881d..ebeaf3e 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -763,6 +763,7 @@ struct vfio_iommu_type1_info_cap_nesting { struct vfio_info_cap_header header; #define VFIO_IOMMU_PASID_REQS (1 << 0) __u32 nesting_capabilities; + __u32 stage1_formats; }; #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v1 8/8] vfio/type1: Add vSVA support for IOMMU-backed mdevs
From: Liu Yi L Recent years, mediated device pass-through framework (e.g. vfio-mdev) are used to achieve flexible device sharing across domains (e.g. VMs). Also there are hardware assisted mediated pass-through solutions from platform vendors. e.g. Intel VT-d scalable mode which supports Intel Scalable I/O Virtualization technology. Such mdevs are called IOMMU- backed mdevs as there are IOMMU enforced DMA isolation for such mdevs. In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain concept, which means mdevs are protected by an iommu domain which is aux-domain of its physical device. Details can be found in the KVM presentation from Kevin Tian. IOMMU-backed equals to IOMMU-capable. https://events19.linuxfoundation.org/wp-content/uploads/2017/12/\ Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf This patch supports NESTING IOMMU for IOMMU-backed mdevs by figuring out the physical device of an IOMMU-backed mdev and then invoking IOMMU requests to IOMMU layer with the physical device and the mdev's aux domain info. With this patch, vSVA (Virtual Shared Virtual Addressing) can be used on IOMMU-backed mdevs. Cc: Kevin Tian CC: Jacob Pan CC: Jun Tian Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L --- drivers/vfio/vfio_iommu_type1.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 937ec3f..d473665 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -132,6 +132,7 @@ struct vfio_regions { struct domain_capsule { struct iommu_domain *domain; + struct vfio_group *group; void *data; }; @@ -148,6 +149,7 @@ static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu, list_for_each_entry(d, &iommu->domain_list, next) { dc.domain = d->domain; list_for_each_entry(g, &d->group_list, next) { + dc.group = g; ret = iommu_group_for_each_dev(g->iommu_group, &dc, fn); if (ret) @@ -2347,7 +2349,12 @@ static int vfio_bind_gpasid_fn(struct device *dev, void *data) struct iommu_gpasid_bind_data *gbind_data = (struct iommu_gpasid_bind_data *) dc->data; - return iommu_sva_bind_gpasid(dc->domain, dev, gbind_data); + if (dc->group->mdev_group) + return iommu_sva_bind_gpasid(dc->domain, + vfio_mdev_get_iommu_device(dev), gbind_data); + else + return iommu_sva_bind_gpasid(dc->domain, + dev, gbind_data); } static int vfio_unbind_gpasid_fn(struct device *dev, void *data) @@ -2356,8 +2363,13 @@ static int vfio_unbind_gpasid_fn(struct device *dev, void *data) struct iommu_gpasid_bind_data *gbind_data = (struct iommu_gpasid_bind_data *) dc->data; - return iommu_sva_unbind_gpasid(dc->domain, dev, + if (dc->group->mdev_group) + return iommu_sva_unbind_gpasid(dc->domain, + vfio_mdev_get_iommu_device(dev), gbind_data->hpasid); + else + return iommu_sva_unbind_gpasid(dc->domain, dev, + gbind_data->hpasid); } /** @@ -2429,7 +2441,12 @@ static int vfio_cache_inv_fn(struct device *dev, void *data) struct iommu_cache_invalidate_info *cache_inv_info = (struct iommu_cache_invalidate_info *) dc->data; - return iommu_cache_invalidate(dc->domain, dev, cache_inv_info); + if (dc->group->mdev_group) + return iommu_cache_invalidate(dc->domain, + vfio_mdev_get_iommu_device(dev), cache_inv_info); + else + return iommu_cache_invalidate(dc->domain, + dev, cache_inv_info); } static long vfio_iommu_type1_ioctl(void *iommu_data, -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v1 1/8] vfio: Add VFIO_IOMMU_PASID_REQUEST(alloc/free)
From: Liu Yi L For a long time, devices have only one DMA address space from platform IOMMU's point of view. This is true for both bare metal and directed- access in virtualization environment. Reason is the source ID of DMA in PCIe are BDF (bus/dev/fnc ID), which results in only device granularity DMA isolation. However, this is changing with the latest advancement in I/O technology area. More and more platform vendors are utilizing the PCIe PASID TLP prefix in DMA requests, thus to give devices with multiple DMA address spaces as identified by their individual PASIDs. For example, Shared Virtual Addressing (SVA, a.k.a Shared Virtual Memory) is able to let device access multiple process virtual address space by binding the virtual address space with a PASID. Wherein the PASID is allocated in software and programmed to device per device specific manner. Devices which support PASID capability are called PASID-capable devices. If such devices are passed through to VMs, guest software are also able to bind guest process virtual address space on such devices. Therefore, the guest software could reuse the bare metal software programming model, which means guest software will also allocate PASID and program it to device directly. This is a dangerous situation since it has potential PASID conflicts and unauthorized address space access. It would be safer to let host intercept in the guest software's PASID allocation. Thus PASID are managed system-wide. This patch adds VFIO_IOMMU_PASID_REQUEST ioctl which aims to passdown PASID allocation/free request from the virtual IOMMU. Additionally, such requests are intended to be invoked by QEMU or other applications which are running in userspace, it is necessary to have a mechanism to prevent single application from abusing available PASIDs in system. With such consideration, this patch tracks the VFIO PASID allocation per-VM. There was a discussion to make quota to be per assigned devices. e.g. if a VM has many assigned devices, then it should have more quota. However, it is not sure how many PASIDs an assigned devices will use. e.g. it is possible that a VM with multiples assigned devices but requests less PASIDs. Therefore per-VM quota would be better. This patch uses struct mm pointer as a per-VM token. We also considered using task structure pointer and vfio_iommu structure pointer. However, task structure is per-thread, which means it cannot achieve per-VM PASID alloc tracking purpose. While for vfio_iommu structure, it is visible only within vfio. Therefore, structure mm pointer is selected. This patch adds a structure vfio_mm. A vfio_mm is created when the first vfio container is opened by a VM. On the reverse order, vfio_mm is free when the last vfio container is released. Each VM is assigned with a PASID quota, so that it is not able to request PASID beyond its quota. This patch adds a default quota of 1000. This quota could be tuned by administrator. Making PASID quota tunable will be added in another patch in this series. Previous discussions: https://patchwork.kernel.org/patch/11209429/ Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L Signed-off-by: Yi Sun Signed-off-by: Jacob Pan --- drivers/vfio/vfio.c | 130 drivers/vfio/vfio_iommu_type1.c | 104 include/linux/vfio.h| 20 +++ include/uapi/linux/vfio.h | 41 + 4 files changed, 295 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index c848262..d13b483 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -32,6 +32,7 @@ #include #include #include +#include #define DRIVER_VERSION "0.3" #define DRIVER_AUTHOR "Alex Williamson " @@ -46,6 +47,8 @@ static struct vfio { struct mutexgroup_lock; struct cdev group_cdev; dev_t group_devt; + struct list_headvfio_mm_list; + struct mutexvfio_mm_lock; wait_queue_head_t release_q; } vfio; @@ -2129,6 +2132,131 @@ int vfio_unregister_notifier(struct device *dev, enum vfio_notify_type type, EXPORT_SYMBOL(vfio_unregister_notifier); /** + * VFIO_MM objects - create, release, get, put, search + * Caller of the function should have held vfio.vfio_mm_lock. + */ +static struct vfio_mm *vfio_create_mm(struct mm_struct *mm) +{ + struct vfio_mm *vmm; + struct vfio_mm_token *token; + int ret = 0; + + vmm = kzalloc(sizeof(*vmm), GFP_KERNEL); + if (!vmm) + return ERR_PTR(-ENOMEM); + + /* Per mm IOASID set used for quota control and group operations */ + ret = ioasid_alloc_set((struct ioasid_set *) mm, + VFIO_DEFAULT_PASID_QUOTA, &vmm->ioasid_sid); +
[PATCH v1 6/8] vfio/type1: Bind guest page tables to host
From: Liu Yi L VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by hardware IOMMUs that have nesting DMA translation (a.k.a dual stage address translation). For such hardware IOMMUs, there are two stages/levels of address translation, and software may let userspace/VM to own the first- level/stage-1 translation structures. Example of such usage is vSVA ( virtual Shared Virtual Addressing). VM owns the first-level/stage-1 translation structures and bind the structures to host, then hardware IOMMU would utilize nesting translation when doing DMA translation fo the devices behind such hardware IOMMU. This patch adds vfio support for binding guest translation (a.k.a stage 1) structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only bind guest page table is needed, it also requires to expose interface to guest for iommu cache invalidation when guest modified the first-level/stage-1 translation structures since hardware needs to be notified to flush stale iotlbs. This would be introduced in next patch. In this patch, guest page table bind and unbind are done by using flags VFIO_IOMMU_BIND_GUEST_PGTBL and VFIO_IOMMU_UNBIND_GUEST_PGTBL under IOCTL VFIO_IOMMU_BIND, the bind/unbind data are conveyed by struct iommu_gpasid_bind_data. Before binding guest page table to host, VM should have got a PASID allocated by host via VFIO_IOMMU_PASID_REQUEST. Bind guest translation structures (here is guest page table) to host are the first step to setup vSVA (Virtual Shared Virtual Addressing). Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Jean-Philippe Brucker Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- drivers/vfio/vfio_iommu_type1.c | 158 include/uapi/linux/vfio.h | 46 2 files changed, 204 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 82a9e0b..a877747 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -130,6 +130,33 @@ struct vfio_regions { #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\ (!list_empty(&iommu->domain_list)) +struct domain_capsule { + struct iommu_domain *domain; + void *data; +}; + +/* iommu->lock must be held */ +static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu, + int (*fn)(struct device *dev, void *data), + void *data) +{ + struct domain_capsule dc = {.data = data}; + struct vfio_domain *d; + struct vfio_group *g; + int ret = 0; + + list_for_each_entry(d, &iommu->domain_list, next) { + dc.domain = d->domain; + list_for_each_entry(g, &d->group_list, next) { + ret = iommu_group_for_each_dev(g->iommu_group, + &dc, fn); + if (ret) + break; + } + } + return ret; +} + static int put_pfn(unsigned long pfn, int prot); /* @@ -2314,6 +2341,88 @@ static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, return 0; } +static int vfio_bind_gpasid_fn(struct device *dev, void *data) +{ + struct domain_capsule *dc = (struct domain_capsule *)data; + struct iommu_gpasid_bind_data *gbind_data = + (struct iommu_gpasid_bind_data *) dc->data; + + return iommu_sva_bind_gpasid(dc->domain, dev, gbind_data); +} + +static int vfio_unbind_gpasid_fn(struct device *dev, void *data) +{ + struct domain_capsule *dc = (struct domain_capsule *)data; + struct iommu_gpasid_bind_data *gbind_data = + (struct iommu_gpasid_bind_data *) dc->data; + + return iommu_sva_unbind_gpasid(dc->domain, dev, + gbind_data->hpasid); +} + +/** + * Unbind specific gpasid, caller of this function requires hold + * vfio_iommu->lock + */ +static long vfio_iommu_type1_do_guest_unbind(struct vfio_iommu *iommu, + struct iommu_gpasid_bind_data *gbind_data) +{ + return vfio_iommu_for_each_dev(iommu, + vfio_unbind_gpasid_fn, gbind_data); +} + +static long vfio_iommu_type1_bind_gpasid(struct vfio_iommu *iommu, + struct iommu_gpasid_bind_data *gbind_data) +{ + int ret = 0; + + mutex_lock(&iommu->lock); + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { + ret = -EINVAL; + goto out_unlock; + } + + ret = vfio_iommu_for_each_dev(iommu, + vfio_bind_gpasid_fn, gbind_data); + /* +* If bind failed, it may not be a total failure. Some devices +* within the iommu group may have bind successfully. Although +* we don't enabl
[PATCH v1 0/8] vfio: expose virtual Shared Virtual Addressing to VMs
From: Liu Yi L Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on Intel platforms allows address space sharing between device DMA and applications. SVA can reduce programming complexity and enhance security. This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing guest application address space with passthru devices. This is called vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU changes. For IOMMU and QEMU changes, they are in separate series (listed in the "Related series"). The high-level architecture for SVA virtualization is as below, the key design of vSVA support is to utilize the dual-stage IOMMU translation ( also known as IOMMU nesting translation) capability in host IOMMU. .-. .---. | vIOMMU| | Guest process CR3, FL only| | | '---' ./ | PASID Entry |--- PASID cache flush - '-' | | | V | |CR3 in GPA '-' Guest --| Shadow |--| vv v Host .-. .--. | pIOMMU| | Bind FL for GVA-GPA | | | '--' ./ | | PASID Entry | V (Nested xlate) '\.--. | | |SL for GPA-HPA, default domain| | | '--' '-' Where: - FL = First level/stage one page tables - SL = Second level/stage two page tables There are roughly four parts in this patchset which are corresponding to the basic vSVA support for PCI device assignment 1. vfio support for PASID allocation and free for VMs 2. vfio support for guest page table binding request from VMs 3. vfio support for IOMMU cache invalidation from VMs 4. vfio support for vSVA usage on IOMMU-backed mdevs The complete vSVA kernel upstream patches are divided into three phases: 1. Common APIs and PCI device direct assignment 2. IOMMU-backed Mediated Device assignment 3. Page Request Services (PRS) support This patchset is aiming for the phase 1 and phase 2, and based on Jacob's below series. [PATCH V10 00/11] Nested Shared Virtual Address (SVA) VT-d support: https://lkml.org/lkml/2020/3/20/1172 Complete set for current vSVA can be found in below branch. https://github.com/luxis1999/linux-vsva.git: vsva-linux-5.6-rc6 The corresponding QEMU patch series is as below, complete QEMU set can be found in below branch. [PATCH v1 00/22] intel_iommu: expose Shared Virtual Addressing to VMs complete QEMU set can be found in below link: https://github.com/luxis1999/qemu.git: sva_vtd_v10_v1 Regards, Yi Liu Changelog: - RFC v1 -> Patch v1: a) Address comments to the PASID request(alloc/free) path b) Report PASID alloc/free availabitiy to user-space c) Add a vfio_iommu_type1 parameter to support pasid quota tuning d) Adjusted to latest ioasid code implementation. e.g. remove the code for tracking the allocated PASIDs as latest ioasid code will track it, VFIO could use ioasid_free_set() to free all PASIDs. - RFC v2 -> v3: a) Refine the whole patchset to fit the roughly parts in this series b) Adds complete vfio PASID management framework. e.g. pasid alloc, free, reclaim in VM crash/down and per-VM PASID quota to prevent PASID abuse. c) Adds IOMMU uAPI version check and page table format check to ensure version compatibility and hardware compatibility. d) Adds vSVA vfio support for IOMMU-backed mdevs. - RFC v1 -> v2: Dropped vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE. Liu Yi L (8): vfio: Add VFIO_IOMMU_PASID_REQUEST(alloc/free) vfio/type1: Add vfio_iommu_type1 parameter for quota tuning vfio/type1: Report PASID alloc/free support to userspace vfio: Check nesting iommu uAPI version vfio/type1: Report 1st-level/stage-1 format to userspace vfio/type1: Bind guest page tables to host vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE vfio/type1: Add vSVA support for IOMMU-backed mdevs drivers/vfio/vfio.c | 136 + drivers/vfio/vfio_iommu_type1.c | 419 include/linux/vfio.h| 21 ++ include/uapi/linux/vfio.h | 127 4 files changed, 703 insertions(+) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v1 2/8] vfio/type1: Add vfio_iommu_type1 parameter for quota tuning
From: Liu Yi L This patch adds a module option to make the PASID quota tunable by administrator. TODO: needs to think more on how to make the tuning to be per-process. Previous discussions: https://patchwork.kernel.org/patch/11209429/ Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L --- drivers/vfio/vfio.c | 8 +++- drivers/vfio/vfio_iommu_type1.c | 7 ++- include/linux/vfio.h| 3 ++- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index d13b483..020a792 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -2217,13 +2217,19 @@ struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task) } EXPORT_SYMBOL_GPL(vfio_mm_get_from_task); -int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int min, int max) +int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int quota, int min, int max) { ioasid_t pasid; int ret = -ENOSPC; mutex_lock(&vmm->pasid_lock); + /* update quota as it is tunable by admin */ + if (vmm->pasid_quota != quota) { + vmm->pasid_quota = quota; + ioasid_adjust_set(vmm->ioasid_sid, quota); + } + pasid = ioasid_alloc(vmm->ioasid_sid, min, max, NULL); if (pasid == INVALID_IOASID) { ret = -ENOSPC; diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 331ceee..e40afc0 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -60,6 +60,11 @@ module_param_named(dma_entry_limit, dma_entry_limit, uint, 0644); MODULE_PARM_DESC(dma_entry_limit, "Maximum number of user DMA mappings per container (65535)."); +static int pasid_quota = VFIO_DEFAULT_PASID_QUOTA; +module_param_named(pasid_quota, pasid_quota, uint, 0644); +MODULE_PARM_DESC(pasid_quota, +"Quota of user owned PASIDs per vfio-based application (1000)."); + struct vfio_iommu { struct list_headdomain_list; struct list_headiova_list; @@ -2200,7 +2205,7 @@ static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu, goto out_unlock; } if (vmm) - ret = vfio_mm_pasid_alloc(vmm, min, max); + ret = vfio_mm_pasid_alloc(vmm, pasid_quota, min, max); else ret = -EINVAL; out_unlock: diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 75f9f7f1..af2ef78 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -106,7 +106,8 @@ struct vfio_mm { extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task); extern void vfio_mm_put(struct vfio_mm *vmm); -extern int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int min, int max); +extern int vfio_mm_pasid_alloc(struct vfio_mm *vmm, + int quota, int min, int max); extern int vfio_mm_pasid_free(struct vfio_mm *vmm, ioasid_t pasid); /* -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v1 3/8] vfio/type1: Report PASID alloc/free support to userspace
From: Liu Yi L This patch reports PASID alloc/free availability to userspace (e.g. QEMU) thus userspace could do a pre-check before utilizing this feature. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L --- drivers/vfio/vfio_iommu_type1.c | 28 include/uapi/linux/vfio.h | 8 2 files changed, 36 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index e40afc0..ddd1ffe 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2234,6 +2234,30 @@ static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu, return ret; } +static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, +struct vfio_info_cap *caps) +{ + struct vfio_info_cap_header *header; + struct vfio_iommu_type1_info_cap_nesting *nesting_cap; + + header = vfio_info_cap_add(caps, sizeof(*nesting_cap), + VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); + if (IS_ERR(header)) + return PTR_ERR(header); + + nesting_cap = container_of(header, + struct vfio_iommu_type1_info_cap_nesting, + header); + + nesting_cap->nesting_capabilities = 0; + if (iommu->nesting) { + /* nesting iommu type supports PASID requests (alloc/free) */ + nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; + } + + return 0; +} + static long vfio_iommu_type1_ioctl(void *iommu_data, unsigned int cmd, unsigned long arg) { @@ -2283,6 +2307,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, if (ret) return ret; + ret = vfio_iommu_info_add_nesting_cap(iommu, &caps); + if (ret) + return ret; + if (caps.size) { info.flags |= VFIO_IOMMU_INFO_CAPS; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 298ac80..8837219 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -748,6 +748,14 @@ struct vfio_iommu_type1_info_cap_iova_range { struct vfio_iova_range iova_ranges[]; }; +#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING 2 + +struct vfio_iommu_type1_info_cap_nesting { + struct vfio_info_cap_header header; +#define VFIO_IOMMU_PASID_REQS (1 << 0) + __u32 nesting_capabilities; +}; + #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) /** -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v1 0/2] vfio/pci: expose device's PASID capability to VMs
From: Liu Yi L Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on Intel platforms allows address space sharing between device DMA and applications. SVA can reduce programming complexity and enhance security. To enable SVA, device needs to have PASID capability, which is a key capability for SVA. This patchset exposes the device's PASID capability to guest instead of hiding it from guest. The second patch emulates PASID capability for VFs (Virtual Function) since VFs don't implement such capability per PCIe spec. This patch emulates such capability and expose to VM if the capability is enabled in PF (Physical Function). However, there is an open for PASID emulation. If PF driver disables PASID capability at runtime, then it may be an issue. e.g. PF should not disable PASID capability if there is guest using this capability on any VF related to this PF. To solve it, may need to introduce a generic communication framework between vfio-pci driver and PF drivers. Please feel free to give your suggestions on it. Regards, Yi Liu Changelog: - RFC v1 -> Patch v1: Add CONFIG_PCI_ATS #ifdef control to avoid compiling error. Liu Yi L (2): vfio/pci: Expose PCIe PASID capability to guest vfio/pci: Emulate PASID/PRI capability for VFs drivers/vfio/pci/vfio_pci_config.c | 327 - 1 file changed, 324 insertions(+), 3 deletions(-) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs
From: Liu Yi L Per PCIe r5.0, sec 9.3.7.14, if a PF implements the PASID Capability, the PF PASID configuration is shared by its VFs. VFs must not implement their own PASID Capability. Per PCIe r5.0, sec 9.3.7.11, VFs must not implement the PRI Capability. If the PF implements PRI, it is shared by the VFs. On bare metal, it has been fixed by below efforts. to PASID/PRI are https://lkml.org/lkml/2019/9/5/996 https://lkml.org/lkml/2019/9/5/995 This patch adds emulated PASID/PRI capabilities for VFs when assigned to VMs via vfio-pci driver. This is required for enabling vSVA on pass-through VFs as VFs have no PASID/PRI capability structure in its configure space. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci_config.c | 325 - 1 file changed, 323 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index 4b9af99..84b4ea0 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -1509,11 +1509,304 @@ static int vfio_cap_init(struct vfio_pci_device *vdev) return 0; } +static int vfio_fill_custom_vconfig_bytes(struct vfio_pci_device *vdev, + int offset, uint8_t *data, int size) +{ + int ret = 0, data_offset = 0; + + while (size) { + int filled; + + if (size >= 4 && !(offset % 4)) { + __le32 *dwordp = (__le32 *)&vdev->vconfig[offset]; + u32 dword; + + memcpy(&dword, data + data_offset, 4); + *dwordp = cpu_to_le32(dword); + filled = 4; + } else if (size >= 2 && !(offset % 2)) { + __le16 *wordp = (__le16 *)&vdev->vconfig[offset]; + u16 word; + + memcpy(&word, data + data_offset, 2); + *wordp = cpu_to_le16(word); + filled = 2; + } else { + u8 *byte = &vdev->vconfig[offset]; + + memcpy(byte, data + data_offset, 1); + filled = 1; + } + + offset += filled; + data_offset += filled; + size -= filled; + } + + return ret; +} + +static int vfio_pci_get_ecap_content(struct pci_dev *pdev, + int cap, int cap_len, u8 *content) +{ + int pos, offset, len = cap_len, ret = 0; + + pos = pci_find_ext_capability(pdev, cap); + if (!pos) + return -EINVAL; + + offset = 0; + while (len) { + int fetched; + + if (len >= 4 && !(pos % 4)) { + u32 *dwordp = (u32 *) (content + offset); + u32 dword; + __le32 *dwptr = (__le32 *) &dword; + + ret = pci_read_config_dword(pdev, pos, &dword); + if (ret) + return ret; + *dwordp = le32_to_cpu(*dwptr); + fetched = 4; + } else if (len >= 2 && !(pos % 2)) { + u16 *wordp = (u16 *) (content + offset); + u16 word; + __le16 *wptr = (__le16 *) &word; + + ret = pci_read_config_word(pdev, pos, &word); + if (ret) + return ret; + *wordp = le16_to_cpu(*wptr); + fetched = 2; + } else { + u8 *byte = (u8 *) (content + offset); + + ret = pci_read_config_byte(pdev, pos, byte); + if (ret) + return ret; + fetched = 1; + } + + pos += fetched; + offset += fetched; + len -= fetched; + } + + return ret; +} + +struct vfio_pci_pasid_cap_data { + u32 id:16; + u32 version:4; + u32 next:12; + union { + u16 cap_reg_val; + struct { + u16 rsv1:1; + u16 execs:1; + u16 prvs:1; + u16 rsv2:5; + u16 pasid_bits:5; + u16 rsv3:3; + }; + } cap_reg; + union { + u16 control_reg_val; + struct { + u16 paside:1; + u16 exece:1; + u16 prve:1; + u16 rsv4:13; + }; + } control_reg; +}; + +static int vfio_pci_add_pasid_c
[PATCH v1 1/2] vfio/pci: Expose PCIe PASID capability to guest
From: Liu Yi L This patch exposes PCIe PASID capability to guest. Existing vfio_pci driver hides it from guest by setting the capability length as 0 in pci_ext_cap_length[]. This capability is required for vSVA enabling on pass-through PCIe devices. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci_config.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index 90c0b80..4b9af99 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -95,7 +95,7 @@ static const u16 pci_ext_cap_length[PCI_EXT_CAP_ID_MAX + 1] = { [PCI_EXT_CAP_ID_LTR]= PCI_EXT_CAP_LTR_SIZEOF, [PCI_EXT_CAP_ID_SECPCI] = 0, /* not yet */ [PCI_EXT_CAP_ID_PMUX] = 0, /* not yet */ - [PCI_EXT_CAP_ID_PASID] = 0, /* not yet */ + [PCI_EXT_CAP_ID_PASID] = PCI_EXT_CAP_PASID_SIZEOF, }; /* -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v1 0/8] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Liu, Yi L > Sent: Sunday, March 22, 2020 8:32 PM > To: alex.william...@redhat.com; eric.au...@redhat.com > Subject: [PATCH v1 0/8] vfio: expose virtual Shared Virtual Addressing to VMs > > From: Liu Yi L > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on > Intel platforms allows address space sharing between device DMA and > applications. SVA can reduce programming complexity and enhance security. > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing > guest application address space with passthru devices. This is called > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU > changes. For IOMMU and QEMU changes, they are in separate series (listed > in the "Related series"). > > The high-level architecture for SVA virtualization is as below, the key > design of vSVA support is to utilize the dual-stage IOMMU translation ( > also known as IOMMU nesting translation) capability in host IOMMU. > > > .-. .---. > | vIOMMU| | Guest process CR3, FL only| > | | '---' > ./ > | PASID Entry |--- PASID cache flush - > '-' | > | | V > | |CR3 in GPA > '-' > Guest > --| Shadow |--| > vv v > Host > .-. .--. > | pIOMMU| | Bind FL for GVA-GPA | > | | '--' > ./ | > | PASID Entry | V (Nested xlate) > '\.--. > | | |SL for GPA-HPA, default domain| > | | '--' > '-' > Where: > - FL = First level/stage one page tables > - SL = Second level/stage two page tables > > There are roughly four parts in this patchset which are > corresponding to the basic vSVA support for PCI device > assignment > 1. vfio support for PASID allocation and free for VMs > 2. vfio support for guest page table binding request from VMs > 3. vfio support for IOMMU cache invalidation from VMs > 4. vfio support for vSVA usage on IOMMU-backed mdevs > > The complete vSVA kernel upstream patches are divided into three phases: > 1. Common APIs and PCI device direct assignment > 2. IOMMU-backed Mediated Device assignment > 3. Page Request Services (PRS) support > > This patchset is aiming for the phase 1 and phase 2, and based on Jacob's > below series. > [PATCH V10 00/11] Nested Shared Virtual Address (SVA) VT-d support: > https://lkml.org/lkml/2020/3/20/1172 > > Complete set for current vSVA can be found in below branch. > https://github.com/luxis1999/linux-vsva.git: vsva-linux-5.6-rc6 > > The corresponding QEMU patch series is as below, complete QEMU set can be > found in below branch. > [PATCH v1 00/22] intel_iommu: expose Shared Virtual Addressing to VMs > complete QEMU set can be found in below link: > https://github.com/luxis1999/qemu.git: sva_vtd_v10_v1 The ioasid extension is in the below link. https://lkml.org/lkml/2020/3/25/874 Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v1 2/8] vfio/type1: Add vfio_iommu_type1 parameter for quota tuning
> From: Tian, Kevin > Sent: Monday, March 30, 2020 4:41 PM > To: Liu, Yi L ; alex.william...@redhat.com; > Subject: RE: [PATCH v1 2/8] vfio/type1: Add vfio_iommu_type1 parameter for > quota > tuning > > > From: Liu, Yi L > > Sent: Sunday, March 22, 2020 8:32 PM > > > > From: Liu Yi L > > > > This patch adds a module option to make the PASID quota tunable by > > administrator. > > > > TODO: needs to think more on how to make the tuning to be per-process. > > > > Previous discussions: > > https://patchwork.kernel.org/patch/11209429/ > > > > Cc: Kevin Tian > > CC: Jacob Pan > > Cc: Alex Williamson > > Cc: Eric Auger > > Cc: Jean-Philippe Brucker > > Signed-off-by: Liu Yi L > > --- > > drivers/vfio/vfio.c | 8 +++- > > drivers/vfio/vfio_iommu_type1.c | 7 ++- > > include/linux/vfio.h| 3 ++- > > 3 files changed, 15 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > > index d13b483..020a792 100644 > > --- a/drivers/vfio/vfio.c > > +++ b/drivers/vfio/vfio.c > > @@ -2217,13 +2217,19 @@ struct vfio_mm *vfio_mm_get_from_task(struct > > task_struct *task) > > } > > EXPORT_SYMBOL_GPL(vfio_mm_get_from_task); > > > > -int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int min, int max) > > +int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int quota, int min, int max) > > { > > ioasid_t pasid; > > int ret = -ENOSPC; > > > > mutex_lock(&vmm->pasid_lock); > > > > + /* update quota as it is tunable by admin */ > > + if (vmm->pasid_quota != quota) { > > + vmm->pasid_quota = quota; > > + ioasid_adjust_set(vmm->ioasid_sid, quota); > > + } > > + > > It's a bit weird to have quota adjusted in the alloc path, since the latter > might > be initiated by non-privileged users. Why not doing the simple math in vfio_ > create_mm to set the quota when the ioasid set is created? even in the future > you may allow per-process quota setting, that should come from separate > privileged path instead of thru alloc.. The reason is the kernel parameter modification has no event which can be used to adjust the quota. So I chose to adjust it in pasid_alloc path. If it's not good, how about adding one more IOCTL to let user- space trigger a quota adjustment event? Then even non-privileged user could trigger quota adjustment, the quota is actually controlled by privileged user. How about your opinion? Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v1 2/8] vfio/type1: Add vfio_iommu_type1 parameter for quota tuning
> From: Tian, Kevin > Sent: Monday, March 30, 2020 5:20 PM > To: Liu, Yi L ; alex.william...@redhat.com; > Subject: RE: [PATCH v1 2/8] vfio/type1: Add vfio_iommu_type1 parameter for > quota > tuning > > > From: Liu, Yi L > > Sent: Monday, March 30, 2020 4:53 PM > > > > > From: Tian, Kevin > > > Sent: Monday, March 30, 2020 4:41 PM > > > To: Liu, Yi L ; alex.william...@redhat.com; > > > Subject: RE: [PATCH v1 2/8] vfio/type1: Add vfio_iommu_type1 > > > parameter > > for quota > > > tuning > > > > > > > From: Liu, Yi L > > > > Sent: Sunday, March 22, 2020 8:32 PM > > > > > > > > From: Liu Yi L > > > > > > > > This patch adds a module option to make the PASID quota tunable by > > > > administrator. > > > > > > > > TODO: needs to think more on how to make the tuning to be per-process. > > > > > > > > Previous discussions: > > > > https://patchwork.kernel.org/patch/11209429/ > > > > > > > > Cc: Kevin Tian > > > > CC: Jacob Pan > > > > Cc: Alex Williamson > > > > Cc: Eric Auger > > > > Cc: Jean-Philippe Brucker > > > > Signed-off-by: Liu Yi L > > > > --- > > > > drivers/vfio/vfio.c | 8 +++- > > > > drivers/vfio/vfio_iommu_type1.c | 7 ++- > > > > include/linux/vfio.h| 3 ++- > > > > 3 files changed, 15 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index > > > > d13b483..020a792 100644 > > > > --- a/drivers/vfio/vfio.c > > > > +++ b/drivers/vfio/vfio.c > > > > @@ -2217,13 +2217,19 @@ struct vfio_mm > > *vfio_mm_get_from_task(struct > > > > task_struct *task) > > > > } > > > > EXPORT_SYMBOL_GPL(vfio_mm_get_from_task); > > > > > > > > -int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int min, int max) > > > > +int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int quota, int min, > > > > +int > > max) > > > > { > > > > ioasid_t pasid; > > > > int ret = -ENOSPC; > > > > > > > > mutex_lock(&vmm->pasid_lock); > > > > > > > > + /* update quota as it is tunable by admin */ > > > > + if (vmm->pasid_quota != quota) { > > > > + vmm->pasid_quota = quota; > > > > + ioasid_adjust_set(vmm->ioasid_sid, quota); > > > > + } > > > > + > > > > > > It's a bit weird to have quota adjusted in the alloc path, since the > > > latter > > might > > > be initiated by non-privileged users. Why not doing the simple math > > > in > > vfio_ > > > create_mm to set the quota when the ioasid set is created? even in > > > the > > future > > > you may allow per-process quota setting, that should come from > > > separate privileged path instead of thru alloc.. > > > > The reason is the kernel parameter modification has no event which can > > be used to adjust the quota. So I chose to adjust it in pasid_alloc > > path. If it's not good, how about adding one more IOCTL to let user- > > space trigger a quota adjustment event? Then even non-privileged user > > could trigger quota adjustment, the quota is actually controlled by > > privileged user. How about your opinion? > > > > why do you need an event to adjust? As I said, you can set the quota when the > set is > created in vfio_create_mm... oh, it's to support runtime adjustments. I guess it may be helpful to let per-VM quota tunable even the VM is running. If just set the quota in vfio_create_mm(), it is not able to adjust at runtime. Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu