On Wed, Apr 13, 2016 at 09:52:33PM +0300, Adam Morrison wrote:
> From: Omer Peleg <o...@cs.technion.ac.il>
> 
> IOVA allocation has two problems that impede high-throughput I/O.
> First, it can do a linear search over the allocated IOVA ranges.
> Second, the rbtree spinlock that serializes IOVA allocations becomes
> contended.
> 
> Address these problems by creating an API for caching allocated IOVA
> ranges, so that the IOVA allocator isn't accessed frequently.  This
> patch adds a per-CPU cache, from which CPUs can alloc/free IOVAs
> without taking the rbtree spinlock.  The per-CPU caches are backed by
> a global cache, to avoid invoking the (linear-time) IOVA allocator
> without needing to make the per-CPU cache size excessive.  This design
> is based on magazines, as described in "Magazines and Vmem: Extending
> the Slab Allocator to Many CPUs and Arbitrary Resources" (currently
> available at https://www.usenix.org/legacy/event/usenix01/bonwick.html)
> 
> Adding caching on top of the existing rbtree allocator maintains the
> property that IOVAs are densely packed in the IO virtual address space,
> which is important for keeping IOMMU page table usage low.
> 
> To keep the cache size reasonable, we limit caching to ranges of
> size <= 128 KB.  Overall, a CPU can cache at most 32 MB and the global
> cache is bounded by 4 MB.

So the cached case still ignores the limit_pfn as I pointed out before.
This can break drivers if the cached pfn is out of dma range the device
can handle or impact performance for devices because of DAC. I think we
could have 2 caches. One for DMA32 and the other for DMA64. Choose
corresponding cache according to limit_pfn. For devices with special DMA
mask, for example, DMA24, just fallback to slow path.

Thanks,
Shaohua
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to