> -----Original Message----- > From: Aaron Conole <acon...@redhat.com> > Sent: Saturday, June 22, 2019 12:57 AM > To: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com> > Cc: Jerin Jacob Kollanukkaran <jer...@marvell.com>; dev@dpdk.org; Nithin > Kumar Dabilpuram <ndabilpu...@marvell.com>; Vamsi Krishna Attunuru > <vattun...@marvell.com>; Olivier Matz <olivier.m...@6wind.com> > Subject: Re: [EXT] Re: [dpdk-dev] [PATCH v3 25/27] mempool/octeontx2: > add optimized dequeue operation for arm64 > > Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com> writes: > > > Hi Aaron, > > > >>-----Original Message----- > >>From: Aaron Conole <acon...@redhat.com> > >>Sent: Tuesday, June 18, 2019 2:55 AM > >>To: Jerin Jacob Kollanukkaran <jer...@marvell.com> > >>Cc: dev@dpdk.org; Nithin Kumar Dabilpuram > <ndabilpu...@marvell.com>; > >>Vamsi Krishna Attunuru <vattun...@marvell.com>; Pavan Nikhilesh > >>Bhagavatula <pbhagavat...@marvell.com>; Olivier Matz > >><olivier.m...@6wind.com> > >>Subject: [EXT] Re: [dpdk-dev] [PATCH v3 25/27] mempool/octeontx2: > >>add optimized dequeue operation for arm64 > >> > >>> From: Pavan Nikhilesh <pbhagavat...@marvell.com> > >>> > >>> This patch adds an optimized arm64 instruction based routine to > >>leverage > >>> CPU pipeline characteristics of octeontx2. The theme is to fill the > >>> pipeline with CASP operations as much HW can do so that HW can do > >>alloc() > >>> HW ops in full throttle. > >>> > >>> Cc: Olivier Matz <olivier.m...@6wind.com> > >>> Cc: Aaron Conole <acon...@redhat.com> > >>> > >>> Signed-off-by: Pavan Nikhilesh <pbhagavat...@marvell.com> > >>> Signed-off-by: Jerin Jacob <jer...@marvell.com> > >>> Signed-off-by: Vamsi Attunuru <vattun...@marvell.com> > >>> --- > >>> drivers/mempool/octeontx2/otx2_mempool_ops.c | 291 > >>+++++++++++++++++++ > >>> 1 file changed, 291 insertions(+) > >>> > >>> diff --git a/drivers/mempool/octeontx2/otx2_mempool_ops.c > >>b/drivers/mempool/octeontx2/otx2_mempool_ops.c > >>> index c59bd73c0..e6737abda 100644 > >>> --- a/drivers/mempool/octeontx2/otx2_mempool_ops.c > >>> +++ b/drivers/mempool/octeontx2/otx2_mempool_ops.c > >>> @@ -37,6 +37,293 @@ npa_lf_aura_op_alloc_one(const int64_t > >>wdata, int64_t * const addr, > >>> return -ENOENT; > >>> } > >>> > >>> +#if defined(RTE_ARCH_ARM64) > >>> +static __rte_noinline int > >>> +npa_lf_aura_op_search_alloc(const int64_t wdata, int64_t * const > >>addr, > >>> + void **obj_table, unsigned int n) { > >>> + uint8_t i; > >>> + > >>> + for (i = 0; i < n; i++) { > >>> + if (obj_table[i] != NULL) > >>> + continue; > >>> + if (npa_lf_aura_op_alloc_one(wdata, addr, obj_table, > >>i)) > >>> + return -ENOENT; > >>> + } > >>> + > >>> + return 0; > >>> +} > >>> + > >>> +static __attribute__((optimize("-O3"))) __rte_noinline int __hot > >> > >>Sorry if I missed this before. > >> > >>Is there a good reason to hard-code this optimization, rather than let > >>the build system provide it? > > > > Some versions of compiler don't have support for __int128_t for CASP > inline-asm. > > i.e. if the optimization level is reduced to -O0 the CASP restrictions > > aren't followed and compiler might end up violation the CASP rules > example: > > > > /tmp/ccSPMGzq.s:1648: Error: reg pair must start from even reg at > > operand 1 - `casp x21,x22,x0,x1,[x19]' > > /tmp/ccSPMGzq.s:1706: Error: reg pair must start from even reg at > > operand 1 - `casp x13,x14,x0,x1,[x11]' > > /tmp/ccSPMGzq.s:1745: Error: reg pair must start from even reg at > > operand 1 - `casp x9,x10,x0,x1,[x7]' > > /tmp/ccSPMGzq.s:1775: Error: reg pair must start from even reg at > > operand 1 - `casp x7,x8,x0,x1,[x5]'* > > > > Forcing to -O3 with __rte_noinline in place fixes it as the alignment fits > > in. > > It makes sense to document this - it isn't apparent that it is needed. > It would be good to put a comment just before that explains it, preferably > with the compilers that aren't behaving. This would help in the future to > determine when it would be safe to drop the flag. Yes. Will add the comment.