On Thu, Sep 16, 2021 at 11:49:39AM -0400, Konrad Rzeszutek Wilk wrote: >On Wed, Sep 01, 2021 at 12:21:35PM +0800, Chao Gao wrote: >> Currently, swiotlb uses a global index to indicate the starting point >> of next search. The index increases from 0 to the number of slots - 1 >> and then wraps around. It is straightforward but not cache-friendly >> because the "oldest" slot in swiotlb tends to be used first. >> >> Freed slots are probably accessed right before being freed, especially >> in VM's case (device backends access them in DMA_TO_DEVICE mode; guest >> accesses them in other DMA modes). Thus those just freed slots may >> reside in cache. Then reusing those just freed slots can reduce cache >> misses. >> >> To that end, maintain a free list for free slots and insert freed slots >> from the head and searching for free slots always starts from the head. >> >> With this optimization, network throughput of sending data from host to >> guest, measured by iperf3, increases by 7%. > >Wow, that is pretty awesome! > >Are there any other benchmarks that you ran that showed a negative >performance?
TBH, yes. Recently I do fio tests with this patch. The impact of this patch is: (+ means performance improvement; - means performance regression) 1-job fio: randread: +6.7% randwrite: -1.6% read: +8.2% write: +7.4% 8-job fio: randread: -5.5% randwrite: -12.6% read: -24.8% write: -45.5% I haven't figured out why multi-job fio tests suffer. Will post v2 once the issue gets resolved. Thanks Chao _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu