On 2024-08-28 23:04, Morten Brørup wrote:
Jakub,
While browsing virtual interfaces in DPDK, I noticed a possible performance
issue in the memif driver:
If "head" and "tail" are accessed by different lcores, they are not sufficiently far away from each other (and other
hot fields) to prevent false sharing-like effects on systems with a next-N-lines hardware prefetcher, which will prefetch "tail"
when fetching "head", and prefetch "head" when fetching "flags".
I suggest updating the structure somewhat like this:
-#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
- alignas(RTE_CACHE_LINE_SIZE) RTE_MARKER mark;
-
-typedef struct {
- MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
+typedef struct __rte_cache_aligned {
uint32_t cookie; /**< MEMIF_COOKIE */
uint16_t flags; /**< flags */
#define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */
+ RTE_CACHE_GUARD; /* isolate head from flags */
Wouldn't it be better to cache align the 'head' (or cache-aligned 'head'
*and* add a RTE_CACHE_GUARD)? In other words, isn't the purpose of
RTE_CACHE_GUARD to provide zero or more cache line of extra padding,
rather than a mechanism to avoid same-cache line false sharing?
RTE_ATOMIC(uint16_t) head; /**< pointer to ring
buffer head */
- MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
+ RTE_CACHE_GUARD; /* isolate tail from head */
RTE_ATOMIC(uint16_t) tail; /**< pointer to ring
buffer tail */
- MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
+ RTE_CACHE_GUARD; /* isolate descriptors from tail */
- memif_desc_t desc[0]; /**< buffer descriptors */
+ memif_desc_t desc[]; /**< buffer descriptors */
} memif_ring_t;
Med venlig hilsen / Kind regards,
-Morten Brørup