Hi, > iova alloc/free causes big lock contention, which could be easily demonstrated > with iperf workload. Previously I posted a patchset: > http://lists.linuxfoundation.org/pipermail/iommu/2015-November/014984.html > > the concern is it's not generic. This is another try for the issue. This > version implements a per-cpu iova cache for small size DMA allocation (<= > 64k), > which should be generic enough and so we can do batch allocation. iova free > could be easily batched too. With batch alloc/free, iova lock contention > disappears. Performance result with this patchset is nearly the same as the > previous one in the same test. > > After this patchset, async_umap_flush_lock becomes the hotest lock in > intel-iommu, but not very bad. That will be something we need work on in the > future.
There can still be significant spinlock contention with this patchset. For example, here's the throughput obtained when accessing 16 memcached instances running on a 16-core Sandy Bridge with an Intel XL710 NIC. The client machine has iommu=off and runs memslap with the default config (64-byte keys, 1024-byte values, and 10%/90% SET/GET ops): stock (4.4.0-rc5) iommu=off: 1,088,996 memcached transactions/sec (=100%, median of 10 runs). this patch, iommu=on: 313,161 memcached transactions/sec (=29%). perf: 21.87% 0.57% memcached [kernel.kallsyms] [k] _raw_spin_lock_irqsave | ---_raw_spin_lock_irqsave | |--67.84%-- intel_unmap (async_umap_flush_lock) |--17.54%-- alloc_iova |--12.85%-- free_iova_array For reference, the patchset I posted two weeks ago gets almost the same throughput as with iommu=off: http://lists.linuxfoundation.org/pipermail/iommu/2015-December/015271.html: 1,067,586 memcached transactions/sec (=98%). perf: 0.75% 0.75% memcached [kernel.kallsyms] [k] _raw_spin_lock_irqsave] _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu