+CC Beilei as i40e maintainer > From: Dharmik Thakkar [mailto:dharmik.thak...@arm.com] > Sent: Thursday, 13 January 2022 06.37 > > Current mempool per core cache implementation stores pointers to mbufs > On 64b architectures, each pointer consumes 8B > This patch replaces it with index-based implementation, > where in each buffer is addressed by (pool base address + index) > It reduces the amount of memory/cache required for per core cache > > L3Fwd performance testing reveals minor improvements in the cache > performance (L1 and L2 misses reduced by 0.60%) > with no change in throughput > > Suggested-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > Signed-off-by: Dharmik Thakkar <dharmik.thak...@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> > --- > lib/mempool/rte_mempool.h | 150 +++++++++++++++++++++++++- > lib/mempool/rte_mempool_ops_default.c | 7 ++ > 2 files changed, 156 insertions(+), 1 deletion(-) > > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h > index 1e7a3c15273c..f2403fbc97a7 100644 > --- a/lib/mempool/rte_mempool.h > +++ b/lib/mempool/rte_mempool.h > @@ -50,6 +50,10 @@ > #include <rte_memcpy.h> > #include <rte_common.h> > > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > +#include <rte_vect.h> > +#endif > + > #include "rte_mempool_trace_fp.h" > > #ifdef __cplusplus > @@ -239,6 +243,9 @@ struct rte_mempool { > int32_t ops_index; > > struct rte_mempool_cache *local_cache; /**< Per-lcore local cache > */ > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > + void *pool_base_value; /**< Base value to calculate indices */ > +#endif > > uint32_t populated_size; /**< Number of populated > objects. */ > struct rte_mempool_objhdr_list elt_list; /**< List of objects in > pool */ > @@ -1314,7 +1321,22 @@ rte_mempool_cache_flush(struct rte_mempool_cache > *cache, > if (cache == NULL || cache->len == 0) > return; > rte_mempool_trace_cache_flush(cache, mp); > + > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > + unsigned int i; > + unsigned int cache_len = cache->len; > + void *obj_table[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; > + void *base_value = mp->pool_base_value; > + uint32_t *cache_objs = (uint32_t *) cache->objs;
Hi Dharmik and Honnappa, The essence of this patch is based on recasting the type of the objs field in the rte_mempool_cache structure from an array of pointers to an array of uint32_t. However, this effectively breaks the ABI, because the rte_mempool_cache structure is public and part of the API. Some drivers [1] even bypass the mempool API and access the rte_mempool_cache structure directly, assuming that the objs array in the cache is an array of pointers. So you cannot recast the fields in the rte_mempool_cache structure the way this patch requires. Although I do consider bypassing an API's accessor functions "spaghetti code", this driver's behavior is formally acceptable as long as the rte_mempool_cache structure is not marked as internal. I really liked your idea of using indexes instead of pointers, so I'm very sorry to shoot it down. :-( [1]: E.g. the Intel i40e PMD, http://code.dpdk.org/dpdk/latest/source/drivers/net/i40e/i40e_rxtx_vec_avx512.c#L25 -Morten