On Mon, Mar 14, 2022 at 03:44:33PM -0400, Matthew Rosato wrote:
> s390x will introduce an additional domain type that is used for
> managing IOMMU owned by KVM. Define the type here and add an
> interface for allocating a specified type vs the default type.
>
> Signed-off-by: Matthew Rosato
> --
On Mon, Mar 14, 2022 at 03:44:34PM -0400, Matthew Rosato wrote:
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 9394aa9444c1..0bec97077d61 100644
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -77,6 +77,7 @@ struct vfio_iommu {
> bool
On Mon, Mar 14, 2022 at 03:44:41PM -0400, Matthew Rosato wrote:
> +int kvm_s390_pci_zpci_start(struct kvm *kvm, struct zpci_dev *zdev)
> +{
> + struct vfio_device *vdev;
> + struct pci_dev *pdev;
> + int rc;
> +
> + rc = kvm_s390_pci_dev_open(zdev);
> + if (rc)
> + r
On Mon, Mar 14, 2022 at 03:44:48PM -0400, Matthew Rosato wrote:
> The DTSM, or designation type supported mask, indicates what IOAT formats
> are available to the guest. For an interpreted device, userspace will not
> know what format(s) the IOAT assist supports, so pass it via the
> capability ch
On Mon, Mar 14, 2022 at 04:50:33PM -0600, Alex Williamson wrote:
> > +/*
> > + * The KVM_IOMMU type implies that the hypervisor will control the mappings
> > + * rather than userspace
> > + */
> > +#define VFIO_KVM_IOMMU 11
>
> Then why is this hosted in the type1 code that ex
On Tue, Mar 08, 2022 at 01:44:10PM +0800, Lu Baolu wrote:
> Hi folks,
>
> The iommu group is the minimal isolation boundary for DMA. Devices in
> a group can access each other's MMIO registers via peer to peer DMA
> and also need share the same I/O address space.
Joerg, are we good for the coming
On Tue, Mar 15, 2022 at 11:16:41AM +, Robin Murphy wrote:
> On 2022-03-15 05:07, Jacob Pan wrote:
> > DMA mapping API is the de facto standard for in-kernel DMA. It operates
> > on a per device/RID basis which is not PASID-aware.
> >
> > Some modern devices such as Intel Data Streaming Acceler
On Mon, Mar 14, 2022 at 10:07:07PM -0700, Jacob Pan wrote:
> + /*
> + * Each domain could have multiple devices attached with shared or per
> + * device PASIDs. At the domain level, we keep track of unique PASIDs
> and
> + * device user count.
> + * E.g. If a domain has two
On Mon, Mar 14, 2022 at 10:07:09PM -0700, Jacob Pan wrote:
> DMA mapping API is the de facto standard for in-kernel DMA. It operates
> on a per device/RID basis which is not PASID-aware.
>
> Some modern devices such as Intel Data Streaming Accelerator, PASID is
> required for certain work submissi
On Tue, Mar 15, 2022 at 09:49:01AM -0400, Matthew Rosato wrote:
> The rationale for splitting steps 1 and 2 are that VFIO_SET_IOMMU doesn't
> have a mechanism for specifying more than the type as an arg, no? Otherwise
> yes, you could specify a kvm fd at this point and it would have some other
>
On Tue, Mar 15, 2022 at 09:36:08AM -0400, Matthew Rosato wrote:
> > If we do try to stick this into VFIO it should probably use the
> > VFIO_TYPE1_NESTING_IOMMU instead - however, we would like to delete
> > that flag entirely as it was never fully implemented, was never used,
> > and isn't part of
On Tue, Mar 15, 2022 at 10:39:18AM -0400, Matthew Rosato wrote:
> > That is something that should be modeled as a nested iommu domain.
> >
> > Querying the formats and any control logic for this should be on the
> > iommu side not built into VFIO.
>
> I agree that the DTSM is really controlled by
On Tue, Mar 15, 2022 at 09:31:35AM -0700, Jacob Pan wrote:
> > IMHO it is a device mis-design of IDXD to require all DMA be PASID
> > tagged. Devices should be able to do DMA on their RID when the PCI
> IDXD can do DMA w/ RID, the PASID requirement is only for shared WQ where
> ENQCMDS is used. E
On Tue, Mar 15, 2022 at 12:04:35PM -0400, Matthew Rosato wrote:
> > You can't pin/unpin in this path, there is no real way to handle error
> > and ulimit stuff here, plus it is really slow. I didn't notice any of
> > this in your patches, so what do you mean by 'pin' above?
>
> patch 18 does some
On Tue, Mar 15, 2022 at 12:29:02PM -0400, Matthew Rosato wrote:
> On 3/15/22 10:38 AM, Jason Gunthorpe wrote:
> > On Tue, Mar 15, 2022 at 09:49:01AM -0400, Matthew Rosato wrote:
> >
> > > The rationale for splitting steps 1 and 2 are that VFIO_SET_IOMMU doesn't
> > > have a mechanism for specifyin
On Tue, Mar 15, 2022 at 03:36:20PM -0700, Jacob Pan wrote:
> Hi Jason,
>
> On Tue, 15 Mar 2022 11:33:22 -0300, Jason Gunthorpe wrote:
>
> > On Mon, Mar 14, 2022 at 10:07:07PM -0700, Jacob Pan wrote:
> > > + /*
> > > + * Each domain could have multiple devices attached with
> > > shared or per
>
On Tue, Mar 15, 2022 at 09:38:10AM -0700, Jacob Pan wrote:
> > > +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid)
> > > +{
> > > + struct iommu_domain *dom;
> > > + ioasid_t id, max;
> > > + int ret;
> > > +
> > > + dom = iommu_get_domain_for_dev(dev);
> > > + if (!dom || !dom->ops
On Wed, Mar 16, 2022 at 08:41:27AM +, Tian, Kevin wrote:
> 1) When the kernel wants a more scalable way of using IDXD e.g. having
> multiple CPUs simultaneously submitting works in a lockless way to a
> shared work queue via a new instruction (ENQCMD) which carries
> PASID.
IMHO the misdesig
On Wed, Mar 16, 2022 at 01:50:04PM -0700, Jacob Pan wrote:
> I guess a list of (device, pasid) tuples as you suggested could work but it
> will have duplicated device entries since each device could have multiple
> PASIDs. right?
Is assigning the same iommu_domain to multiple PASIDs of the same
d
On Wed, Mar 16, 2022 at 10:23:26PM +, Luck, Tony wrote:
> Kernel users (ring0) can supply any PASID when they use
> the ENQCMDS instruction. Is that what you mean when you
> say "real applications"?
I'm not talking about ENQCMD at all
I'm saying I don't see much utility to have two PASIDs as
On Wed, Mar 16, 2022 at 05:49:59PM -0700, Jacob Pan wrote:
> > I would expect real applications will try to use the same PASID for
> > the same IOVA map to optimize IOTLB caching.
> >
> > Is there a use case for that I'm missing?
> >
> Yes. it would be more efficient for PASID selective domain T
On Thu, Mar 17, 2022 at 05:47:36AM +, Tian, Kevin wrote:
> > From: Robin Murphy
> > Sent: Tuesday, March 15, 2022 6:49 PM
> >
> > On 2022-03-14 19:44, Matthew Rosato wrote:
> > > s390x will introduce an additional domain type that is used for
> > > managing IOMMU owned by KVM. Define the type
On Fri, Mar 18, 2022 at 07:01:19AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe
> > Sent: Tuesday, March 15, 2022 10:55 PM
> >
> > The first level iommu_domain has the 'type1' map and unmap and pins
> > the pages. This is the 1:1 map with the GPA and ends up pinning all
> > guest memory be
On Fri, Mar 18, 2022 at 05:47:23AM +, Tian, Kevin wrote:
> may remember more detail here) and we didn't hear strong interest
> from customer on it. So this is just FYI for theoretical possibility and
> I'm fine with even disallowing it before those issues are resolved.
I didn't mean disallow,
On Fri, Mar 18, 2022 at 07:52:33PM +0800, Lu Baolu wrote:
> On 2022/3/15 13:07, Jacob Pan wrote:
> > From: Lu Baolu
> >
> > An IOMMU domain represents an address space which can be attached by
> > devices that perform DMA within a domain. However, for platforms with
> > PASID capability the domai
On Fri, Mar 18, 2022 at 08:01:04PM +0800, Lu Baolu wrote:
> On 2022/3/15 19:49, Tian, Kevin wrote:
> > > From: Jean-Philippe Brucker
> > > Sent: Tuesday, March 15, 2022 7:27 PM
> > >
> > > On Mon, Mar 14, 2022 at 10:07:06PM -0700, Jacob Pan wrote:
> > > > From: Lu Baolu
> > > >
> > > > An IOMMU d
On Fri, Mar 18, 2022 at 02:23:57AM +, Tian, Kevin wrote:
> Yes, that is another major part work besides the iommufd work. And
> it is not compatible with KVM features which rely on the dynamic
> manner of EPT. Though It is a bit questionable whether it's worthy of
> doing so just for saving me
From: Kevin Tian
Add iommufd to the documentation tree.
Signed-off-by: Kevin Tian
Signed-off-by: Jason Gunthorpe
---
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/iommufd.rst | 224
2 files changed, 225 insertions(+)
create mode 100644
Connect the IOAS to its IOCTL interface. This exposes most of the
functionality in the io_pagetable to userspace.
This is intended to be the core of the generic interface that IOMMUFD will
provide. Every IOMMU driver should be able to implement an iommu_domain
that is compatible with this generic
The iopt_pages which represents a logical linear list of PFNs held in
different storage tiers. Each area points to a slice of exactly one
iopt_pages, and each iopt_pages can have multiple areas and users.
The three storage tiers are managed to meet these objectives:
- If no iommu_domain or user
The hw_pagetable object exposes the internal struct iommu_domain's to
userspace. An iommu_domain is required when any DMA device attaches to an
IOAS to control the io page table through the iommu driver.
For compatibility with VFIO the hw_pagetable is automatically created when
a DMA device is att
The top of the data structure provides an IO Address Space (IOAS) that is
similar to a VFIO container. The IOAS allows map/unmap of memory into
ranges of IOVA called iopt_areas. Domains and in-kernel users (like VFIO
mdevs) can be attached to the IOAS to access the PFNs that those IOVA
areas cover.
Following the pattern of io_uring, perf, skb, and bpf iommfd will use
user->locked_vm for accounting pinned pages. Ensure the value is included
in the struct and export free_uid() as iommufd is modular.
user->locked_vm is the correct accounting to use for ulimit because it is
per-user, and the uli
This is the remainder of the IOAS data structure. Provide an object called
an io_pagetable that is composed of iopt_areas pointing at iopt_pages,
along with a list of iommu_domains that mirror the IOVA to PFN map.
At the top this is a simple interval tree of iopt_areas indicating the map
of IOVA t
iommufd is the user API to control the IOMMU subsystem as it relates to
managing IO page tables that point at user space memory.
It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
container) which is the VFIO specific interface for a similar idea.
We see a broad need for extended fe
Add the four functions external drivers need to connect physical DMA to
the IOMMUFD:
iommufd_bind_pci_device() / iommufd_unbind_device()
Register the device with iommufd and establish security isolation.
iommufd_device_attach() / iommufd_device_detach()
Connect a bound device to a page table
The span iterator travels over the indexes of the interval_tree, not the
nodes, and classifies spans of indexes as either 'used' or 'hole'.
'used' spans are fully covered by nodes in the tree and 'hole' spans have
no node intersecting the span.
This is done greedily such that spans are maximally
This is the basic infrastructure of a new miscdevice to hold the iommufd
IOCTL API.
It provides:
- A miscdevice to create file descriptors to run the IOCTL interface over
- A table based ioctl dispatch and centralized extendable pre-validation
step
- An xarray mapping user ID's to kernel o
Cover the essential functionality of the iommufd with a directed
test. This aims to achieve reasonable functional coverage using the
in-kernel self test framework.
It provides a mock for the iommu_domain that allows it to run without any
HW and the mocking provides a way to directly validate that
iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
mapping them into io_pagetable operations. Doing so allows the use of
iommufd by symliking /dev/vfio/vfio to /dev/iommufd. Allowing VFIO to
SET_CONTAINER using a iommufd instead of a container fd is a followup
series.
Internally
On Sun, Mar 20, 2022 at 02:40:21PM +0800, Lu Baolu wrote:
> Add a new iommu domain type IOMMU_DOMAIN_SVA to represent an I/O page
> table which is shared from CPU host VA. Add a sva_cookie field in the
> iommu_domain structure to save the mm_struct which represent the CPU
> memory page table.
>
>
On Mon, Mar 21, 2022 at 07:13:49AM +, Tian, Kevin wrote:
> /*
>* To keep things simple, SVA currently doesn't support IOMMU groups
>* with more than one device. Existing SVA-capable systems are not
>* affected by the problems that required IOMMU groups (lack of AC
On Sun, Mar 20, 2022 at 02:40:23PM +0800, Lu Baolu wrote:
> Add support for SVA domain allocation and provide an SVA-specific
> iommu_domain_ops.
>
> Signed-off-by: Lu Baolu
> include/linux/intel-iommu.h | 1 +
> drivers/iommu/intel/iommu.c | 12
> drivers/iommu/intel/svm.c | 34
On Mon, Mar 21, 2022 at 11:31:19AM +, Jean-Philippe Brucker wrote:
> For now we could just return a naked struct iommu_domain. Sanity checks in
> arm_smmu_attach_dev() would be good too, a SVA domain can't be attached as
> a parent domain.
Now that we have per-domain ops the 'standard' arm_smm
On Mon, Mar 21, 2022 at 07:01:45PM +0800, Lu Baolu wrote:
> > one domain can be attached by multiple devices, so this should not be
> > a blind alloc.
>
> Indeed. Perhaps we could associate the SVA domain with the mm->pasid and
> add a user counter inside the domain.
This has the same problem as
On Sun, Mar 20, 2022 at 02:40:25PM +0800, Lu Baolu wrote:
> +/**
> + * iommu_sva_bind_device() - Bind a process address space to a device
> + * @dev: the device
> + * @mm: the mm to bind, caller must hold a reference to it
> + * @drvdata: opaque data pointer to pass to bind callback
> + *
> + * Cr
On Sun, Mar 20, 2022 at 02:40:28PM +0800, Lu Baolu wrote:
> @@ -3098,7 +3101,16 @@ int iommu_attach_device_pasid(struct iommu_domain
> *domain,
> if (iommu_group_device_count(group) != 1)
> goto out_unlock;
>
> + xa_lock(&group->pasid_array);
> + curr = __xa_cmpxchg(&
On Mon, Mar 21, 2022 at 11:42:16AM +, Jean-Philippe Brucker wrote:
> I tend to disagree with that last part. The fault is caused by a specific
> device accessing shared page tables. We should keep that device
> information throughout the fault handling, so that we can report it to the
> driver
On Sun, Mar 20, 2022 at 02:40:29PM +0800, Lu Baolu wrote:
> +static enum iommu_page_response_code
> +iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
> +{
> + vm_fault_t ret;
> + struct mm_struct *mm;
> + struct vm_area_struct *vma;
> + unsigned int access_flags = 0;
>
On Sat, Mar 19, 2022 at 07:51:31AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe
> > Sent: Friday, March 18, 2022 10:13 PM
> >
> > On Fri, Mar 18, 2022 at 02:23:57AM +, Tian, Kevin wrote:
> >
> > > Yes, that is another major part work besides the iommufd work. And
> > > it is not compa
On Tue, Mar 22, 2022 at 01:03:14PM +0800, Lu Baolu wrote:
> On 2022/3/21 20:43, Jason Gunthorpe wrote:
> > On Mon, Mar 21, 2022 at 11:42:16AM +, Jean-Philippe Brucker wrote:
> >
> > > I tend to disagree with that last part. The fault is caused by a specific
> > > device accessing shared page t
On Tue, Mar 22, 2022 at 03:18:51PM +0100, Niklas Schnelle wrote:
> On Fri, 2022-03-18 at 14:27 -0300, Jason Gunthorpe wrote:
> > This is the basic infrastructure of a new miscdevice to hold the iommufd
> > IOCTL API.
> >
> > It provides:
> > - A miscdevice to create file descriptors to run the IO
On Tue, Mar 22, 2022 at 03:28:22PM +0100, Niklas Schnelle wrote:
> On Fri, 2022-03-18 at 14:27 -0300, Jason Gunthorpe wrote:
> > Following the pattern of io_uring, perf, skb, and bpf iommfd will use
> iommufd ^
> > user->locked_vm for accounting p
On Tue, Mar 22, 2022 at 09:29:23AM -0600, Alex Williamson wrote:
> I'm still picking my way through the series, but the later compat
> interface doesn't mention this difference as an outstanding issue.
> Doesn't this difference need to be accounted in how libvirt manages VM
> resource limits?
A
On Tue, Mar 22, 2022 at 05:31:26PM +0100, Niklas Schnelle wrote:
> > > In fact I stumbled over that because the wrong accounting in
> > > io_uring exhausted the applied to vfio (I was using a QEMU utilizing
> > > io_uring itself).
> >
> > I'm pretty interested in this as well, do you have anythin
On Wed, Mar 23, 2022 at 04:37:50PM +0100, Niklas Schnelle wrote:
> > +/*
> > + * This holds a pinned page list for multiple areas of IO address space.
> > The
> > + * pages always originate from a linear chunk of userspace VA. Multiple
> > + * io_pagetable's, through their iopt_area's, can share a
On Wed, Mar 23, 2022 at 12:10:01PM -0600, Alex Williamson wrote:
> > +EXPORT_SYMBOL_GPL(iommufd_bind_pci_device);
>
> I'm stumped why this needs to be PCI specific. Anything beyond the RID
> comment? Please enlighten. Thanks,
The way it turned out in the end it is not for a good reason any
mor
On Tue, Mar 22, 2022 at 04:15:44PM -0600, Alex Williamson wrote:
> > +static struct iopt_area *
> > +iopt_alloc_area(struct io_pagetable *iopt, struct iopt_pages *pages,
> > + unsigned long iova, unsigned long start_byte,
> > + unsigned long length, int iommu_prot, unsigned int
On Wed, Mar 23, 2022 at 01:10:38PM -0600, Alex Williamson wrote:
> On Fri, 18 Mar 2022 14:27:33 -0300
> Jason Gunthorpe wrote:
>
> > +static int conv_iommu_prot(u32 map_flags)
> > +{
> > + int iommu_prot;
> > +
> > + /*
> > +* We provide no manual cache coherency ioctls to userspace and m
On Wed, Mar 23, 2022 at 02:04:46PM -0600, Alex Williamson wrote:
> On Wed, 23 Mar 2022 16:34:39 -0300
> Jason Gunthorpe wrote:
>
> > On Wed, Mar 23, 2022 at 01:10:38PM -0600, Alex Williamson wrote:
> > > On Fri, 18 Mar 2022 14:27:33 -0300
> > > Jason Gunthorpe wrote:
> > >
> > > > +static int
On Wed, Mar 23, 2022 at 05:34:18PM -0300, Jason Gunthorpe wrote:
> Stated another way, any platform that wires dev_is_dma_coherent() to
> true, like all x86 does, must support IOMMU_CACHE and report
> IOMMU_CAP_CACHE_COHERENCY for every iommu_domain the platform
> supports. The platform obviously
On Wed, Mar 23, 2022 at 04:51:25PM -0600, Alex Williamson wrote:
> My overall question here would be whether we can actually achieve a
> compatibility interface that has sufficient feature transparency that we
> can dump vfio code in favor of this interface, or will there be enough
> niche use cas
On Thu, Mar 24, 2022 at 11:50:47AM +0800, Jason Wang wrote:
> It's simply because we don't want to break existing userspace. [1]
I'm still waiting to hear what exactly breaks in real systems.
As I explained this is not a significant change, but it could break
something in a few special scenarios
On Thu, Mar 24, 2022 at 03:09:46AM +, Tian, Kevin wrote:
> + /*
> + * Can't cross areas that are not aligned to the system page
> + * size with this API.
> + */
> + if (cur_iova % PAGE_SIZE) {
> + rc = -EINVAL;
>
On Thu, Mar 24, 2022 at 07:25:03AM +, Tian, Kevin wrote:
> Based on that here is a quick tweak of the force-snoop part (not compiled).
I liked your previous idea better, that IOMMU_CAP_CACHE_COHERENCY
started out OK but got weird. So lets fix it back to the way it was.
How about this:
https
On Thu, Mar 24, 2022 at 02:40:15PM -0600, Alex Williamson wrote:
> On Tue, 22 Mar 2022 13:15:21 -0300
> Jason Gunthorpe via iommu wrote:
>
> > On Tue, Mar 22, 2022 at 09:29:23AM -0600, Alex Williamson wrote:
> >
> > > I'm still picking my way throu
On Thu, Mar 24, 2022 at 04:04:03PM -0600, Alex Williamson wrote:
> On Wed, 23 Mar 2022 21:33:42 -0300
> Jason Gunthorpe wrote:
>
> > On Wed, Mar 23, 2022 at 04:51:25PM -0600, Alex Williamson wrote:
> >
> > > My overall question here would be whether we can actually achieve a
> > > compatibility
On Fri, Mar 25, 2022 at 09:34:08PM +0800, zhangfei@foxmail.com wrote:
> > + iopt->iova_alignment = new_iova_alignment;
> > + xa_store(&iopt->domains, iopt->next_domain_id, domain, GFP_KERNEL);
> > + iopt->next_domain_id++;
> Not understand here.
>
> Do we get the domain = xa_load(&iopt-
On Sun, Mar 27, 2022 at 02:32:23AM +, Tian, Kevin wrote:
> > this looks good to me except that the 2nd patch (eab4b381) should be
> > the last one otherwise it affects bisect. and in that case the subject
> > would be simply about removing the capability instead of
> > restoring...
Oh because
On Mon, Mar 28, 2022 at 09:53:27AM +0800, Jason Wang wrote:
> To me, it looks more easier to not answer this question by letting
> userspace know about the change,
That is not backwards compatbile, so I don't think it helps unless we
say if you open /dev/vfio/vfio you get old behavior and if you o
On Mon, Mar 28, 2022 at 02:14:26PM +0100, Sean Mooney wrote:
> On Mon, 2022-03-28 at 09:53 +0800, Jason Wang wrote:
> > On Thu, Mar 24, 2022 at 7:46 PM Jason Gunthorpe wrote:
> > >
> > > On Thu, Mar 24, 2022 at 11:50:47AM +0800, Jason Wang wrote:
> > >
> > > > It's simply because we don't want t
On Mon, Mar 28, 2022 at 11:17:23AM -0600, Alex Williamson wrote:
> On Thu, 24 Mar 2022 10:46:22 -0300
> Jason Gunthorpe wrote:
>
> > On Thu, Mar 24, 2022 at 07:25:03AM +, Tian, Kevin wrote:
> >
> > > Based on that here is a quick tweak of the force-snoop part (not
> > > compiled).
> >
>
On Mon, Mar 28, 2022 at 03:57:53PM -0300, Jason Gunthorpe wrote:
> So, currently AMD and Intel have exactly the same HW feature with a
> different kAPI..
I fixed it like below and made the ordering changes Kevin pointed
to. Will send next week after the merge window:
527e438a974a06 iommu: Delete
On Tue, Mar 29, 2022 at 08:42:13AM +, Tian, Kevin wrote:
> btw I'm not sure whether this is what SVA requires. IIRC the problem with
> SVA is because PASID TLP prefix is not counted in PCI packet routing thus
> a DMA target address with PASID might be treated as P2P if the address
> falls into
On Tue, Mar 29, 2022 at 12:59:52PM +0800, Jason Wang wrote:
> vDPA has a backend feature negotiation, then actually, userspace can
> tell vDPA to go with the new accounting approach. Not sure RDMA can do
> the same.
A security feature userspace can ask to turn off is not really a
security feature
On Wed, Mar 30, 2022 at 06:50:11AM +, Tian, Kevin wrote:
> One thing that I'm not very sure is about DMA alias. Even when physically
> there is only a single device within the group the aliasing could lead
> to multiple RIDs in the group making it non-singleton. But probably we
> don't need su
On Wed, Mar 30, 2022 at 05:00:39PM +0300, Tony Lindgren wrote:
> Hi,
>
> * Lu Baolu [700101 02:00]:
> > The is_attach_deferred iommu_ops callback is a device op. The domain
> > argument is unnecessary and never used. Remove it to make code clean.
>
> Looks like this causes a regression for at le
On Wed, Mar 30, 2022 at 02:12:57PM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe
> > Sent: Wednesday, March 30, 2022 7:58 PM
> >
> > On Wed, Mar 30, 2022 at 06:50:11AM +, Tian, Kevin wrote:
> >
> > > One thing that I'm not very sure is about DMA alias. Even when physically
> > > there i
On Wed, Mar 30, 2022 at 08:19:37PM +0300, Tony Lindgren wrote:
> > > __iommu_probe_device from probe_iommu_group+0x2c/0x38
> > > probe_iommu_group from bus_for_each_dev+0x74/0xbc
> > > bus_for_each_dev from bus_iommu_probe+0x34/0x2e8
> > > bus_iommu_probe from bus_set_iommu+0x80/0xc8
> > > bus_set
On Tue, Mar 29, 2022 at 01:37:52PM +0800, Lu Baolu wrote:
> @@ -95,6 +101,7 @@ struct iommu_domain {
> void *handler_token;
> struct iommu_domain_geometry geometry;
> struct iommu_dma_cookie *iova_cookie;
> + struct iommu_sva_cookie *sva_cookie;
Cookie is still the wrong word
On Tue, Mar 29, 2022 at 01:37:53PM +0800, Lu Baolu wrote:
> Attaching an IOMMU domain to a PASID of a device is a generic operation
> for modern IOMMU drivers which support PASID-granular DMA address
> translation. Currently visible usage scenarios include (but not limited):
>
> - SVA (Shared Vir
On Tue, Mar 29, 2022 at 01:37:55PM +0800, Lu Baolu wrote:
> Add support for SVA domain allocation and provide an SVA-specific
> iommu_domain_ops.
>
> Signed-off-by: Lu Baolu
> include/linux/intel-iommu.h | 1 +
> drivers/iommu/intel/iommu.c | 10 ++
> drivers/iommu/intel/svm.c | 37 ++
On Thu, Mar 31, 2022 at 03:36:29PM +1100, David Gibson wrote:
> > +/**
> > + * struct iommu_ioas_iova_ranges - ioctl(IOMMU_IOAS_IOVA_RANGES)
> > + * @size: sizeof(struct iommu_ioas_iova_ranges)
> > + * @ioas_id: IOAS ID to read ranges from
> > + * @out_num_iovas: Output total number of ranges in t
On Wed, Mar 30, 2022 at 09:35:52PM +0800, Yi Liu wrote:
> > +/**
> > + * struct iommu_ioas_copy - ioctl(IOMMU_IOAS_COPY)
> > + * @size: sizeof(struct iommu_ioas_copy)
> > + * @flags: Combination of enum iommufd_ioas_map_flags
> > + * @dst_ioas_id: IOAS ID to change the mapping of
> > + * @src_ioas
On Thu, Mar 31, 2022 at 01:59:22PM -0700, Jacob Pan wrote:
> > + handle->dev = dev;
> > + handle->domain = domain;
> > + handle->pasid = mm->pasid;
> why do we need to store pasid here? Conceptually, pasid is per sva domain
> not per bind. You can get it from handle->domain->sva_cookie.
Th
On Fri, Apr 01, 2022 at 02:20:23PM +0800, Yi Liu wrote:
>
>
> On 2022/3/29 19:42, Jason Gunthorpe wrote:
> > On Tue, Mar 29, 2022 at 08:42:13AM +, Tian, Kevin wrote:
> >
> > > btw I'm not sure whether this is what SVA requires. IIRC the problem with
> > > SVA is because PASID TLP prefix is n
On Sat, Apr 02, 2022 at 07:12:12AM +, Tian, Kevin wrote:
> That is one scenario of dma aliasing. Another is like Alex replied where
> one device has an alias requestor ID due to PCI quirks. The alias RID
> may or may not map to a real device but probably what we really care
> here regarding to
On Sat, Apr 02, 2022 at 08:43:16AM +, Tian, Kevin wrote:
> > This assumes any domain is interchangeable with any device, which is
> > not the iommu model. We need a domain op to check if a device is
> > compatiable with the domain for vfio an iommufd, this should do the
> > same.
>
> This sug
On Mon, Apr 04, 2022 at 01:43:49PM +0800, Lu Baolu wrote:
> On 2022/3/30 19:58, Jason Gunthorpe wrote:
> > > > Testing the group size is inherently the wrong test to make.
> > > What is your suggestion then?
> > Add a flag to the group that positively indicates the group can never
> > have more tha
On Tue, Apr 05, 2022 at 02:12:42PM +0800, Lu Baolu wrote:
> On 2022/4/5 1:24, Jason Gunthorpe wrote:
> > On Mon, Apr 04, 2022 at 01:43:49PM +0800, Lu Baolu wrote:
> > > On 2022/3/30 19:58, Jason Gunthorpe wrote:
> > > > > > Testing the group size is inherently the wrong test to make.
> > > > > What
vdpa and usnic are trying to test if IOMMU_CACHE is supported. The correct
way to do this is via dev_is_dma_coherent() like the DMA API does. If
IOMMU_CACHE is not supported then these drivers won't work as they don't
call any coherency-restoring routines around their DMAs.
Signed-off-by: Jason Gu
dev_is_dma_coherent() is the control to determine if IOMMU_CACHE can be
supported.
IOMMU_CACHE means that normal DMAs do not require any additional coherency
mechanism and is the basic uAPI that VFIO exposes to userspace. For
instance VFIO applications like DPDK will not work if additional coheren
PCIe defines a 'no-snoop' bit in each the TLP which is usually implemented
by a platform as bypassing elements in the DMA coherent CPU cache
hierarchy. A driver can command a device to set this bit on some of its
transactions as a micro-optimization.
However, the driver is now responsible to synch
Nothing reads this value anymore, remove it.
Signed-off-by: Jason Gunthorpe
---
drivers/iommu/amd/iommu.c | 2 --
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 --
drivers/iommu/arm/arm-smmu/arm-smmu.c | 6 --
drivers/iommu/arm/arm-smmu/qcom_iommu.c | 6 --
IOMMU_CACHE means "normal DMA to this iommu_domain's IOVA should be cache
coherent" and is used by the DMA API. The definition allows for special
non-coherent DMA to exist - ie processing of the no-snoop flag in PCIe
TLPs - so long as this behavior is opt-in by the device driver.
The flag is used
This new mechanism will replace using IOMMU_CAP_CACHE_COHERENCY and
IOMMU_CACHE to control the no-snoop blocking behavior of the IOMMU.
Currently only Intel and AMD IOMMUs are known to support this
feature. They both implement it as an IOPTE bit, that when set, will cause
PCIe TLPs to that IOVA wi
On Tue, Apr 05, 2022 at 01:10:44PM -0600, Alex Williamson wrote:
> On Tue, 5 Apr 2022 13:16:01 -0300
> Jason Gunthorpe wrote:
>
> > dev_is_dma_coherent() is the control to determine if IOMMU_CACHE can be
> > supported.
> >
> > IOMMU_CACHE means that normal DMAs do not require any additional coh
On Tue, Apr 05, 2022 at 01:50:36PM -0600, Alex Williamson wrote:
> >
> > +static bool intel_iommu_enforce_cache_coherency(struct iommu_domain
> > *domain)
> > +{
> > + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +
> > + if (!dmar_domain->iommu_snooping)
> > + retu
On Wed, Apr 06, 2022 at 01:00:13AM +, Tian, Kevin wrote:
> > Because domains wrap more than just the IOPTE format, they have
> > additional data related to the IOMMU HW block itself. Imagine a SOC
> > with two IOMMU HW blocks that can both process the CPU IOPTE format,
> > but have different c
On Wed, Apr 06, 2022 at 07:30:39AM +0200, Christoph Hellwig wrote:
> On Tue, Apr 05, 2022 at 01:16:00PM -0300, Jason Gunthorpe wrote:
> > diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c
> > b/drivers/infiniband/hw/usnic/usnic_uiom.c
> > index 760b254ba42d6b..24d118198ac756 100644
> > +++ b/d
1 - 100 of 563 matches
Mail list logo