On Mon, Jan 11, 2016 at 12:56:12AM +0200, Adam Morrison wrote: > Hi, > > > iova alloc/free causes big lock contention, which could be easily > > demonstrated > > with iperf workload. Previously I posted a patchset: > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.linuxfoundation.org_pipermail_iommu_2015-2DNovember_014984.html&d=CwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=X13hAPkxmvBro1Ug8vcKHw&m=i-Y1E3oSPFHeeKufBaG6XLhWgcShO1zrKIdjlpP2AYo&s=iNOUfc0u4NPOZoGmz6B7zCJdYEE8jkrlTtZq7JG2vjc&e= > > > > > > the concern is it's not generic. This is another try for the issue. This > > version implements a per-cpu iova cache for small size DMA allocation (<= > > 64k), > > which should be generic enough and so we can do batch allocation. iova free > > could be easily batched too. With batch alloc/free, iova lock contention > > disappears. Performance result with this patchset is nearly the same as the > > previous one in the same test. > > > > After this patchset, async_umap_flush_lock becomes the hotest lock in > > intel-iommu, but not very bad. That will be something we need work on in the > > future. > > There can still be significant spinlock contention with this patchset. > For example, here's the throughput obtained when accessing 16 memcached > instances running on a 16-core Sandy Bridge with an Intel XL710 NIC. > The client machine has iommu=off and runs memslap with the default > config (64-byte keys, 1024-byte values, and 10%/90% SET/GET ops): > > stock (4.4.0-rc5) iommu=off: > 1,088,996 memcached transactions/sec (=100%, median of 10 runs). > > this patch, iommu=on: > 313,161 memcached transactions/sec (=29%). > perf: 21.87% 0.57% memcached [kernel.kallsyms] [k] > _raw_spin_lock_irqsave > | > ---_raw_spin_lock_irqsave > | > |--67.84%-- intel_unmap (async_umap_flush_lock) > |--17.54%-- alloc_iova > |--12.85%-- free_iova_array
Yes, mine doesn't remove the async_unmap_flush_lock contention yet. Should be easy with a percpu stuff or atomic based ring management. > For reference, the patchset I posted two weeks ago gets almost the same > throughput as with iommu=off: > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.linuxfoundation.org_pipermail_iommu_2015-2DDecember_015271.html-3A&d=CwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=X13hAPkxmvBro1Ug8vcKHw&m=i-Y1E3oSPFHeeKufBaG6XLhWgcShO1zrKIdjlpP2AYo&s=eBeKYSequdm8wso0jsaRPjeKsKup7p5Tv_CDT2O0ZEs&e= > > 1,067,586 memcached transactions/sec (=98%). > perf: 0.75% 0.75% memcached [kernel.kallsyms] [k] > _raw_spin_lock_irqsave] I don't know you already posted one. Roughly looked at the patches. We are using exactly the same idea. I'm happy we pursue your patches. At the first look, the per-cpu allocation in your patch doesn't check pfn_limit, that could be wrong, but should be easy to fix. I'll take a close look tomorrow. Thanks, Shaohua _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu