Morten Brørup <m...@smartsharesystems.com> writes:
> +Ray Kinsella, ABI Policy maintainer > >> From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] >> Sent: Friday, 21 January 2022 07.01 >> >> > >> > +CC Beilei as i40e maintainer >> > >> > > From: Dharmik Thakkar [mailto:dharmik.thak...@arm.com] >> > > Sent: Thursday, 13 January 2022 06.37 >> > > >> > > Current mempool per core cache implementation stores pointers to >> mbufs >> > > On 64b architectures, each pointer consumes 8B This patch replaces >> it >> > > with index-based implementation, where in each buffer is addressed >> by >> > > (pool base address + index) It reduces the amount of memory/cache >> > > required for per core cache >> > > >> > > L3Fwd performance testing reveals minor improvements in the cache >> > > performance (L1 and L2 misses reduced by 0.60%) with no change in >> > > throughput >> > > >> > > Suggested-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> >> > > Signed-off-by: Dharmik Thakkar <dharmik.thak...@arm.com> >> > > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> >> > > --- >> > > lib/mempool/rte_mempool.h | 150 >> +++++++++++++++++++++++++- >> > > lib/mempool/rte_mempool_ops_default.c | 7 ++ >> > > 2 files changed, 156 insertions(+), 1 deletion(-) >> > > >> > > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h >> > > index 1e7a3c15273c..f2403fbc97a7 100644 >> > > --- a/lib/mempool/rte_mempool.h >> > > +++ b/lib/mempool/rte_mempool.h >> > > @@ -50,6 +50,10 @@ >> > > #include <rte_memcpy.h> >> > > #include <rte_common.h> >> > > >> > > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> > > +#include <rte_vect.h> >> > > +#endif >> > > + >> > > #include "rte_mempool_trace_fp.h" >> > > >> > > #ifdef __cplusplus >> > > @@ -239,6 +243,9 @@ struct rte_mempool { >> > > int32_t ops_index; >> > > >> > > struct rte_mempool_cache *local_cache; /**< Per-lcore local >> > > cache >> */ >> > > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> > > + void *pool_base_value; /**< Base value to calculate indices */ >> > > +#endif >> > > >> > > uint32_t populated_size; /**< Number of populated >> > > objects. */ >> > > struct rte_mempool_objhdr_list elt_list; /**< List of objects in >> > > pool */ @@ -1314,7 +1321,22 @@ rte_mempool_cache_flush(struct >> > > rte_mempool_cache *cache, >> > > if (cache == NULL || cache->len == 0) >> > > return; >> > > rte_mempool_trace_cache_flush(cache, mp); >> > > + >> > > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> > > + unsigned int i; >> > > + unsigned int cache_len = cache->len; >> > > + void *obj_table[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; >> > > + void *base_value = mp->pool_base_value; >> > > + uint32_t *cache_objs = (uint32_t *) cache->objs; >> > >> > Hi Dharmik and Honnappa, >> > >> > The essence of this patch is based on recasting the type of the objs >> field in the >> > rte_mempool_cache structure from an array of pointers to an array of >> > uint32_t. >> > >> > However, this effectively breaks the ABI, because the >> rte_mempool_cache >> > structure is public and part of the API. >> The patch does not change the public structure, the new member is under >> compile time flag, not sure how it breaks the ABI. >> >> > >> > Some drivers [1] even bypass the mempool API and access the >> > rte_mempool_cache structure directly, assuming that the objs array in >> the >> > cache is an array of pointers. So you cannot recast the fields in the >> > rte_mempool_cache structure the way this patch requires. >> IMO, those drivers are at fault. The mempool cache structure is public >> only because the APIs are inline. We should still maintain modularity >> and not use the members of structures belonging to another library >> directly. A similar effort involving rte_ring was not accepted sometime >> back [1] >> >> [1] >> http://inbox.dpdk.org/dev/DBAPR08MB5814907968595EE56F5E20A798390@DBAPR0 >> 8MB5814.eurprd08.prod.outlook.com/ >> >> > >> > Although I do consider bypassing an API's accessor functions >> "spaghetti >> > code", this driver's behavior is formally acceptable as long as the >> > rte_mempool_cache structure is not marked as internal. >> > >> > I really liked your idea of using indexes instead of pointers, so I'm >> very sorry to >> > shoot it down. :-( >> > >> > [1]: E.g. the Intel i40e PMD, >> > >> http://code.dpdk.org/dpdk/latest/source/drivers/net/i40e/i40e_rxtx_vec_ >> avx >> > 512.c#L25 >> It is possible to throw an error when this feature is enabled in this >> file. Alternatively, this PMD could implement the code for index based >> mempool. >> > > I agree with both your points, Honnappa. > > The ABI remains intact, and only changes when this feature is enabled at > compile time. > > In addition to your suggestions, I propose that the patch modifies the objs > type in the mempool cache structure itself, instead of type casting it > through an access variable. This should throw an error when compiling an > application that accesses it as a pointer array instead of a uint32_t array - > like the affected Intel PMDs. > > The updated objs field in the mempool cache structure should have the same > size when compiled as the original objs field, so this feature doesn't change > anything else in the ABI, only the type of the mempool cache objects. > > Also, the description of the feature should stress that applications > accessing the cache objects directly will fail miserably. Thanks for CC'ing me Morten. My 2c is that, I would be slow in supporting this patch as it introduces code paths that are harder (impossible?) to test regularly. So yes, it is optional, in that case are we just adding automatically dead code - I would ask, if a runtime option not make more sense for this? Also we can't automatically assume what the PMD's are doing are breaking an unwritten rule (breaking abstractions) - I would guess these are doing it for solid performance reasons. If so that would futher support my point about making the mempool runtime configurable and query-able (is this mempool a bucket of indexes or pointers etc), and enabling the PMDs to ask rather than assume. Like Morten, I like the idea, saving memory and reducing cache misses with indexes, this is all good IMHO. -- Regards, Ray K