<snip> > > From: Jerin Jacob <jer...@marvell.com> > > The exiting optimize_object_size() function address the memory object > alignment constraint on x86 for better performance. > > Different (Mirco) architecture may have different memory alignment > constraint for better performance and it not same as the existing > optimize_object_size() function. Some use, XOR(kind of CRC) scheme to > enable DRAM channel distribution based on the address and some may have > a different formula. If I understand correctly, address interleaving is the characteristic of the memory controller and not the CPU. For ex: different SoCs using the same Arm architecture might have different memory controllers. So, the solution should not be architecture specific, but SoC specific.
> > Introducing arch_mem_object_align() function to abstract the differences in > different (mirco) architectures and avoid wasting memory for mempool > object alignment for the architecture the existing optimize_object_size() is > not valid. > > Additional details: > https://www.mail-archive.com/dev@dpdk.org/msg149157.html > > Fixes: af75078fece3 ("first public release") > Cc: sta...@dpdk.org > > Signed-off-by: Jerin Jacob <jer...@marvell.com> > --- > doc/guides/prog_guide/mempool_lib.rst | 6 +++--- > lib/librte_mempool/rte_mempool.c | 17 +++++++++++++---- > 2 files changed, 16 insertions(+), 7 deletions(-) > > diff --git a/doc/guides/prog_guide/mempool_lib.rst > b/doc/guides/prog_guide/mempool_lib.rst > index 3bb84b0a6..eea7a2906 100644 > --- a/doc/guides/prog_guide/mempool_lib.rst > +++ b/doc/guides/prog_guide/mempool_lib.rst > @@ -27,10 +27,10 @@ In debug mode > (CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG is enabled), statistics about get > from/put in the pool are stored in the mempool structure. > Statistics are per-lcore to avoid concurrent access to statistics counters. > > -Memory Alignment Constraints > ----------------------------- > +Memory Alignment Constraints on X86 architecture > +------------------------------------------------ > > -Depending on hardware memory configuration, performance can be greatly > improved by adding a specific padding between objects. > +Depending on hardware memory configuration on X86 architecture, > performance can be greatly improved by adding a specific padding between > objects. > The objective is to ensure that the beginning of each object starts on a > different channel and rank in memory so that all channels are equally loaded. > > This is particularly true for packet buffers when doing L3 forwarding or flow > classification. > diff --git a/lib/librte_mempool/rte_mempool.c > b/lib/librte_mempool/rte_mempool.c > index 78d8eb941..871894525 100644 > --- a/lib/librte_mempool/rte_mempool.c > +++ b/lib/librte_mempool/rte_mempool.c > @@ -45,6 +45,7 @@ EAL_REGISTER_TAILQ(rte_mempool_tailq) > #define CALC_CACHE_FLUSHTHRESH(c) \ > ((typeof(c))((c) * CACHE_FLUSHTHRESH_MULTIPLIER)) > > +#if defined(RTE_ARCH_X86) > /* > * return the greatest common divisor between a and b (fast algorithm) > * > @@ -74,12 +75,13 @@ static unsigned get_gcd(unsigned a, unsigned b) } > > /* > - * Depending on memory configuration, objects addresses are spread > + * Depending on memory configuration on x86 arch, objects addresses are > + spread > * between channels and ranks in RAM: the pool allocator will add > * padding between objects. This function return the new size of the > * object. > */ > -static unsigned optimize_object_size(unsigned obj_size) > +static unsigned > +arch_mem_object_align(unsigned obj_size) > { > unsigned nrank, nchan; > unsigned new_obj_size; > @@ -99,6 +101,13 @@ static unsigned optimize_object_size(unsigned > obj_size) > new_obj_size++; > return new_obj_size * RTE_MEMPOOL_ALIGN; } > +#else This applies to add Arm (PPC as well) SoCs which might have different schemes depending on the memory controller. IMO, this should not be architecture specific. > +static unsigned > +arch_mem_object_align(unsigned obj_size) { > + return obj_size; > +} > +#endif > > struct pagesz_walk_arg { > int socket_id; > @@ -234,8 +243,8 @@ rte_mempool_calc_obj_size(uint32_t elt_size, > uint32_t flags, > */ > if ((flags & MEMPOOL_F_NO_SPREAD) == 0) { > unsigned new_size; > - new_size = optimize_object_size(sz->header_size + sz- > >elt_size + > - sz->trailer_size); > + new_size = arch_mem_object_align > + (sz->header_size + sz->elt_size + sz->trailer_size); > sz->trailer_size = new_size - sz->header_size - sz->elt_size; > } > > -- > 2.24.1