> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > > On 2024-08-28 23:04, Morten Brørup wrote: > > Jakub, > > > > While browsing virtual interfaces in DPDK, I noticed a possible performance > issue in the memif driver: > > > > If "head" and "tail" are accessed by different lcores, they are not > sufficiently far away from each other (and other hot fields) to prevent false > sharing-like effects on systems with a next-N-lines hardware prefetcher, which > will prefetch "tail" when fetching "head", and prefetch "head" when fetching > "flags". > > > > I suggest updating the structure somewhat like this: > > > > -#define MEMIF_CACHELINE_ALIGN_MARK(mark) \ > > - alignas(RTE_CACHE_LINE_SIZE) RTE_MARKER mark; > > - > > -typedef struct { > > - MEMIF_CACHELINE_ALIGN_MARK(cacheline0); > > +typedef struct __rte_cache_aligned { > > uint32_t cookie; /**< MEMIF_COOKIE */ > > uint16_t flags; /**< flags */ > > #define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */ > > + RTE_CACHE_GUARD; /* isolate head from flags */ > > Wouldn't it be better to cache align the 'head' (or cache-aligned 'head' > *and* add a RTE_CACHE_GUARD)? In other words, isn't the purpose of > RTE_CACHE_GUARD to provide zero or more cache line of extra padding, > rather than a mechanism to avoid same-cache line false sharing?
IMO the general purpose of RTE_CACHE_GUARD is to prevent false cache line sharing; both sharing of the same cache line (on systems with or without speculative prefetching) and sharing of the next cache lines (on systems with speculative prefetching). RTE_CACHE_GUARD provides two things: 1. Zero or more bytes of padding up to cache alignment, which prevents same-cache line sharing. This effectively cache aligns the field that follows the RTE_CACHE_GUARD, here the "head". 2. Zero or more cache lines of extra padding (configured by RTE_CACHE_GUARD_LINES in rte_config.h), which prevents sharing of the next cache lines on systems with speculative prefetching. My description failed to mention the reason for the RTE_CACHE_GUARD between "flags" and "head": The lcore updating "tail" also reads "flags", and if reading "flags" causes that lcore to prefetch the next cache line, it will thereby read the cache line holding "head", causing false cache line sharing with the other lcore updating "head". > > > RTE_ATOMIC(uint16_t) head; /**< pointer to ring > > buffer head > */ > > - MEMIF_CACHELINE_ALIGN_MARK(cacheline1); > > + RTE_CACHE_GUARD; /* isolate tail from head */ > > RTE_ATOMIC(uint16_t) tail; /**< pointer to ring > > buffer tail > */ > > - MEMIF_CACHELINE_ALIGN_MARK(cacheline2); > > + RTE_CACHE_GUARD; /* isolate descriptors from tail */ > > - memif_desc_t desc[0]; /**< buffer descriptors */ > > + memif_desc_t desc[]; /**< buffer descriptors */ > > } memif_ring_t; > > > > > > Med venlig hilsen / Kind regards, > > -Morten Brørup > >