On Fri, Aug 25, 2023 at 08:45:12AM +0200, Morten Brørup wrote:
> Bruce,
> 
> With this patch [1], it is noted that the ring producer and consumer data 
> should not be on adjacent cache lines, for performance reasons.
> 
> [1]: 
> https://git.dpdk.org/dpdk/commit/lib/librte_ring/rte_ring.h?id=d9f0d3a1ffd4b66e75485cc8b63b9aedfbdfe8b0
> 
> (It's obvious that they cannot share the same cache line, because they are 
> accessed by two different threads.)
> 
> Intuitively, I would think that having them on different cache lines would 
> suffice. Why does having an empty cache line between them make a difference?
> 
> And does it need to be an empty cache line? Or does it suffice having the 
> second structure start at two cache lines after the start of the first 
> structure (e.g. if the size of the first structure is two cache lines)?
> 
> I'm asking because the same principle might apply to other code too.
> 
Hi Morten,

this was something we discovered when working on the distributor library.
If we have cachelines per core where there is heavy access, having some
cachelines as a gap between the content cachelines can help performance. We
believe this helps due to avoiding issues with the HW prefetchers (e.g.
adjacent cacheline prefetcher) bringing in the second cacheline
speculatively when an operation is done on the first line.

/Bruce

Reply via email to