On Wed, Apr 13, 2016 at 09:52:33PM +0300, Adam Morrison wrote: > From: Omer Peleg <o...@cs.technion.ac.il> > > IOVA allocation has two problems that impede high-throughput I/O. > First, it can do a linear search over the allocated IOVA ranges. > Second, the rbtree spinlock that serializes IOVA allocations becomes > contended. > > Address these problems by creating an API for caching allocated IOVA > ranges, so that the IOVA allocator isn't accessed frequently. This > patch adds a per-CPU cache, from which CPUs can alloc/free IOVAs > without taking the rbtree spinlock. The per-CPU caches are backed by > a global cache, to avoid invoking the (linear-time) IOVA allocator > without needing to make the per-CPU cache size excessive. This design > is based on magazines, as described in "Magazines and Vmem: Extending > the Slab Allocator to Many CPUs and Arbitrary Resources" (currently > available at https://www.usenix.org/legacy/event/usenix01/bonwick.html) > > Adding caching on top of the existing rbtree allocator maintains the > property that IOVAs are densely packed in the IO virtual address space, > which is important for keeping IOMMU page table usage low. > > To keep the cache size reasonable, we limit caching to ranges of > size <= 128 KB. Overall, a CPU can cache at most 32 MB and the global > cache is bounded by 4 MB.
So the cached case still ignores the limit_pfn as I pointed out before. This can break drivers if the cached pfn is out of dma range the device can handle or impact performance for devices because of DAC. I think we could have 2 caches. One for DMA32 and the other for DMA64. Choose corresponding cache according to limit_pfn. For devices with special DMA mask, for example, DMA24, just fallback to slow path. Thanks, Shaohua _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu