On Wed, Jan 15, 2020 at 2:35 AM <jer...@marvell.com> wrote: > > From: Jerin Jacob <jer...@marvell.com> > > The existing optimize_object_size() function address the memory object > alignment constraint on x86 for better performance. > > Different (micro) architecture may have different memory alignment > constraint for better performance and it not the same as the existing > optimize_object_size(). > > Some use, XOR(kind of CRC) scheme to enable DRAM channel distribution > based on the address and some may have a different formula. > > Introducing arch_mem_object_align() function to abstract > the difference between different (micro) architectures to avoid > wasting memory for mempool object alignment for the architecture > that it is not required to do so. > > Details on the amount of memory saving: > > Currently, arm64 based architectures use the default (nchan=4, > nrank=1). The worst case is for an object whose size (including mempool > header) is 2 cache lines, where it is optimized to 3 cache lines (+50%). > > Examples for cache lines size = 64: > orig optimized > 64 -> 64 +0% > 128 -> 192 +50% > 192 -> 192 +0% > 256 -> 320 +25% > 320 -> 320 +0% > 384 -> 448 +16% > ... > 2304 -> 2368 +2.7% (~mbuf size) > > Additional details: > https://www.mail-archive.com/dev@dpdk.org/msg149157.html > > Signed-off-by: Jerin Jacob <jer...@marvell.com> > Reviewed-by: Gavin Hu <gavin...@arm.com>
Ping for merge. > --- > v4: > > - s/X86/x86 in the doc/guides/prog_guide/mempool_lib.rst file (David) > - Remove Fixes and CC: sta...@dpdk.org as well, as this is just a > improvement.(David) > - Udpate the patch heading to remove checkpatch warning due to above > change. > > v3: > - Change comment for MEMPOOL_F_NO_SPREAD flag as > " Spreading among memory channels not required." (Stephen Hemminger) > > v2: > - Changed the return type of arch_mem_object_align() to "unsigned int" from > "unsigned" to fix the checkpatch issues (Olivier Matz) > - Updated the comments for MEMPOOL_F_NO_SPREAD (Olivier Matz) > - Update the git comments to share the memory saving details. > > doc/guides/prog_guide/mempool_lib.rst | 6 +++--- > lib/librte_mempool/rte_mempool.c | 17 +++++++++++++---- > lib/librte_mempool/rte_mempool.h | 3 ++- > 3 files changed, 18 insertions(+), 8 deletions(-) > > diff --git a/doc/guides/prog_guide/mempool_lib.rst > b/doc/guides/prog_guide/mempool_lib.rst > index 3bb84b0a6..f8b430d65 100644 > --- a/doc/guides/prog_guide/mempool_lib.rst > +++ b/doc/guides/prog_guide/mempool_lib.rst > @@ -27,10 +27,10 @@ In debug mode (CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG is > enabled), > statistics about get from/put in the pool are stored in the mempool > structure. > Statistics are per-lcore to avoid concurrent access to statistics counters. > > -Memory Alignment Constraints > ----------------------------- > +Memory Alignment Constraints on x86 architecture > +------------------------------------------------ > > -Depending on hardware memory configuration, performance can be greatly > improved by adding a specific padding between objects. > +Depending on hardware memory configuration on X86 architecture, performance > can be greatly improved by adding a specific padding between objects. > The objective is to ensure that the beginning of each object starts on a > different channel and rank in memory so that all channels are equally loaded. > > This is particularly true for packet buffers when doing L3 forwarding or > flow classification. > diff --git a/lib/librte_mempool/rte_mempool.c > b/lib/librte_mempool/rte_mempool.c > index 78d8eb941..1909998e8 100644 > --- a/lib/librte_mempool/rte_mempool.c > +++ b/lib/librte_mempool/rte_mempool.c > @@ -45,6 +45,7 @@ EAL_REGISTER_TAILQ(rte_mempool_tailq) > #define CALC_CACHE_FLUSHTHRESH(c) \ > ((typeof(c))((c) * CACHE_FLUSHTHRESH_MULTIPLIER)) > > +#if defined(RTE_ARCH_X86) > /* > * return the greatest common divisor between a and b (fast algorithm) > * > @@ -74,12 +75,13 @@ static unsigned get_gcd(unsigned a, unsigned b) > } > > /* > - * Depending on memory configuration, objects addresses are spread > + * Depending on memory configuration on x86 arch, objects addresses are > spread > * between channels and ranks in RAM: the pool allocator will add > * padding between objects. This function return the new size of the > * object. > */ > -static unsigned optimize_object_size(unsigned obj_size) > +static unsigned int > +arch_mem_object_align(unsigned int obj_size) > { > unsigned nrank, nchan; > unsigned new_obj_size; > @@ -99,6 +101,13 @@ static unsigned optimize_object_size(unsigned obj_size) > new_obj_size++; > return new_obj_size * RTE_MEMPOOL_ALIGN; > } > +#else > +static unsigned int > +arch_mem_object_align(unsigned int obj_size) > +{ > + return obj_size; > +} > +#endif > > struct pagesz_walk_arg { > int socket_id; > @@ -234,8 +243,8 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t > flags, > */ > if ((flags & MEMPOOL_F_NO_SPREAD) == 0) { > unsigned new_size; > - new_size = optimize_object_size(sz->header_size + > sz->elt_size + > - sz->trailer_size); > + new_size = arch_mem_object_align > + (sz->header_size + sz->elt_size + > sz->trailer_size); > sz->trailer_size = new_size - sz->header_size - sz->elt_size; > } > > diff --git a/lib/librte_mempool/rte_mempool.h > b/lib/librte_mempool/rte_mempool.h > index f81152af9..f48c94def 100644 > --- a/lib/librte_mempool/rte_mempool.h > +++ b/lib/librte_mempool/rte_mempool.h > @@ -260,7 +260,8 @@ struct rte_mempool { > #endif > } __rte_cache_aligned; > > -#define MEMPOOL_F_NO_SPREAD 0x0001 /**< Do not spread among memory > channels. */ > +#define MEMPOOL_F_NO_SPREAD 0x0001 > +/**< Spreading among memory channels not required. */ > #define MEMPOOL_F_NO_CACHE_ALIGN 0x0002 /**< Do not align objs on cache > lines.*/ > #define MEMPOOL_F_SP_PUT 0x0004 /**< Default put is > "single-producer".*/ > #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is > "single-consumer".*/ > -- > 2.24.1 >