On 2024-08-28 23:04, Morten Brørup wrote:
Jakub,

While browsing virtual interfaces in DPDK, I noticed a possible performance 
issue in the memif driver:

If "head" and "tail" are accessed by different lcores, they are not sufficiently far away from each other (and other 
hot fields) to prevent false sharing-like effects on systems with a next-N-lines hardware prefetcher, which will prefetch "tail" 
when fetching "head", and prefetch "head" when fetching "flags".

I suggest updating the structure somewhat like this:

-#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
-       alignas(RTE_CACHE_LINE_SIZE) RTE_MARKER mark;
-
-typedef struct {
-       MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
+typedef struct __rte_cache_aligned {
        uint32_t cookie;                        /**< MEMIF_COOKIE */
        uint16_t flags;                         /**< flags */
#define MEMIF_RING_FLAG_MASK_INT 1              /**< disable interrupt mode */
+       RTE_CACHE_GUARD; /* isolate head from flags */

Wouldn't it be better to cache align the 'head' (or cache-aligned 'head' *and* add a RTE_CACHE_GUARD)? In other words, isn't the purpose of RTE_CACHE_GUARD to provide zero or more cache line of extra padding, rather than a mechanism to avoid same-cache line false sharing?

        RTE_ATOMIC(uint16_t) head;                      /**< pointer to ring 
buffer head */
-       MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
+       RTE_CACHE_GUARD; /* isolate tail from head */
        RTE_ATOMIC(uint16_t) tail;                      /**< pointer to ring 
buffer tail */
-       MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
+       RTE_CACHE_GUARD; /* isolate descriptors from tail */
-       memif_desc_t desc[0];                   /**< buffer descriptors */
+       memif_desc_t desc[];                    /**< buffer descriptors */
} memif_ring_t;


Med venlig hilsen / Kind regards,
-Morten Brørup

Reply via email to