Hi Joerg, On Wed, Jul 24, 2019 at 09:19:59AM +0200, Joerg Roedel wrote: > On Thu, Jul 11, 2019 at 06:19:12PM +0100, Will Deacon wrote: > > static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t > > dma_addr, > > size_t size) > > { > > + struct iommu_iotlb_gather iotlb_gather; > > struct iommu_dma_cookie *cookie = domain->iova_cookie; > > struct iova_domain *iovad = &cookie->iovad; > > size_t iova_off = iova_offset(iovad, dma_addr); > > + size_t unmapped; > > > > dma_addr -= iova_off; > > size = iova_align(iovad, size + iova_off); > > + iommu_iotlb_gather_init(&iotlb_gather); > > + > > + unmapped = iommu_unmap_fast(domain, dma_addr, size, &iotlb_gather); > > + WARN_ON(unmapped != size); > > > > - WARN_ON(iommu_unmap_fast(domain, dma_addr, size) != size); > > if (!cookie->fq_domain) > > - iommu_tlb_sync(domain); > > + iommu_tlb_sync(domain, &iotlb_gather); > > iommu_dma_free_iova(cookie, dma_addr, size); > > I looked through your patches and was wondering if we can't make the > 'struct iotlb_gather' a member of 'struct iommu_domain' and update it > transparently for the user? That would make things easier for users of > the iommu-api.
Thanks for having a look. My first (private) attempt at this series did exactly what you suggest, but the problem there is that you can have concurrent, disjoint map()/unmap() operations on the same 'struct iommu_domain', so you end up needing either multiple 'iotlb_gather' structures to support these callers or you end up massively over-invalidating. Worse, you can't just make things per-cpu without disabling preemption, so whatever you do you end up re-introducing synchronization to maintain these things. The /huge/ advantage of getting the caller to allocate the 'iotlb_gather' structure on their stack is that you don't have to worry about object lifetime at all, so all of the synchronization disappears. There is also precedent for this when unmapping things on the CPU side -- see the use of 'struct mmu_gather' and tlb_gather_mmu() in unmap_region() [mm/mmap.c], for example. There aren't tonnes of (direct) users of the iommu-api, and the additional complexity introduced by the 'struct iotlb_gather' only applies to users of iommu_unmap_fast(), which I think is a reasonable trade-off in return for the potential for improved performance. Will _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu