Hi Joerg,

On Wed, Jul 24, 2019 at 09:19:59AM +0200, Joerg Roedel wrote:
> On Thu, Jul 11, 2019 at 06:19:12PM +0100, Will Deacon wrote:
> >  static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t 
> > dma_addr,
> >             size_t size)
> >  {
> > +   struct iommu_iotlb_gather iotlb_gather;
> >     struct iommu_dma_cookie *cookie = domain->iova_cookie;
> >     struct iova_domain *iovad = &cookie->iovad;
> >     size_t iova_off = iova_offset(iovad, dma_addr);
> > +   size_t unmapped;
> >  
> >     dma_addr -= iova_off;
> >     size = iova_align(iovad, size + iova_off);
> > +   iommu_iotlb_gather_init(&iotlb_gather);
> > +
> > +   unmapped = iommu_unmap_fast(domain, dma_addr, size, &iotlb_gather);
> > +   WARN_ON(unmapped != size);
> >  
> > -   WARN_ON(iommu_unmap_fast(domain, dma_addr, size) != size);
> >     if (!cookie->fq_domain)
> > -           iommu_tlb_sync(domain);
> > +           iommu_tlb_sync(domain, &iotlb_gather);
> >     iommu_dma_free_iova(cookie, dma_addr, size);
> 
> I looked through your patches and was wondering if we can't make the
> 'struct iotlb_gather' a member of 'struct iommu_domain' and update it
> transparently for the user? That would make things easier for users of
> the iommu-api.

Thanks for having a look.

My first (private) attempt at this series did exactly what you suggest, but
the problem there is that you can have concurrent, disjoint map()/unmap()
operations on the same 'struct iommu_domain', so you end up needing either
multiple 'iotlb_gather' structures to support these callers or you end up
massively over-invalidating. Worse, you can't just make things per-cpu
without disabling preemption, so whatever you do you end up re-introducing
synchronization to maintain these things. The /huge/ advantage of getting
the caller to allocate the 'iotlb_gather' structure on their stack is that
you don't have to worry about object lifetime at all, so all of the
synchronization disappears. There is also precedent for this when unmapping
things on the CPU side -- see the use of 'struct mmu_gather' and
tlb_gather_mmu() in unmap_region() [mm/mmap.c], for example.

There aren't tonnes of (direct) users of the iommu-api, and the additional
complexity introduced by the 'struct iotlb_gather' only applies to users of
iommu_unmap_fast(), which I think is a reasonable trade-off in return for
the potential for improved performance.

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to