On 10/30/20 8:29 PM, Thomas Monjalon wrote:
The mempool pointer in the mbuf struct is moved
from the second to the first half.
It should increase performance on most systems having 64-byte cache line,
i.e. mbuf is split in two cache lines.
On such system, the first half (also called first cache line) is hotter
than the second one where the pool pointer was.

Moving this field gives more space to dynfield1.

This is how the mbuf layout looks like (pahole-style):

word  type                              name                byte  size
  0    void *                            buf_addr;         /*   0 +  8 */
  1    rte_iova_t                        buf_iova          /*   8 +  8 */
       /* --- RTE_MARKER64               rearm_data;                   */
  2    uint16_t                          data_off;         /*  16 +  2 */
       uint16_t                          refcnt;           /*  18 +  2 */
       uint16_t                          nb_segs;          /*  20 +  2 */
       uint16_t                          port;             /*  22 +  2 */
  3    uint64_t                          ol_flags;         /*  24 +  8 */
       /* --- RTE_MARKER                 rx_descriptor_fields1;        */
  4    uint32_t             union        packet_type;      /*  32 +  4 */
       uint32_t                          pkt_len;          /*  36 +  4 */
  5    uint16_t                          data_len;         /*  40 +  2 */
       uint16_t                          vlan_tci;         /*  42 +  2 */
  5.5  uint64_t             union        hash;             /*  44 +  8 */
  6.5  uint16_t                          vlan_tci_outer;   /*  52 +  2 */
       uint16_t                          buf_len;          /*  54 +  2 */
  7    struct rte_mempool *              pool;             /*  56 +  8 */
       /* --- RTE_MARKER                 cacheline1;                   */
  8    struct rte_mbuf *                 next;             /*  64 +  8 */
  9    uint64_t             union        tx_offload;       /*  72 +  8 */
10    uint16_t                          priv_size;        /*  80 +  2 */
       uint16_t                          timesync;         /*  82 +  2 */
       uint32_t                          seqn;             /*  84 +  4 */

As I understand rebase is required since seqn is already removed
(or at least fix here).

11    struct rte_mbuf_ext_shared_info * shinfo;           /*  88 +  8 */
12    uint64_t                          dynfield1[4];     /*  96 + 32 */
16    /* --- END                                             128      */

Signed-off-by: Thomas Monjalon <tho...@monjalon.net>

Taking Konstantin reply into account ('next' is used on free in any case together with 'pool', so the second cache line is accessed in any case), I think that 'next' is
a better candidate. Also 'tx_offload' is a better candidate than 'pool'.
I think 'next' is better since it works for both Rx and Tx, but 'tx_offload' is Tx only.

Reply via email to