octeontx2: add optimized dequeue operation for arm64

Jerin Jacob Kollanukkaran Sat, 22 Jun 2019 06:21:46 -0700

> -----Original Message-----
> From: Aaron Conole <acon...@redhat.com>
> Sent: Saturday, June 22, 2019 12:57 AM
> To: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com>
> Cc: Jerin Jacob Kollanukkaran <jer...@marvell.com>; dev@dpdk.org; Nithin
> Kumar Dabilpuram <ndabilpu...@marvell.com>; Vamsi Krishna Attunuru
> <vattun...@marvell.com>; Olivier Matz <olivier.m...@6wind.com>
> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH v3 25/27] mempool/octeontx2:
> add optimized dequeue operation for arm64
> 
> Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com> writes:
> 
> > Hi Aaron,
> >
> >>-----Original Message-----
> >>From: Aaron Conole <acon...@redhat.com>
> >>Sent: Tuesday, June 18, 2019 2:55 AM
> >>To: Jerin Jacob Kollanukkaran <jer...@marvell.com>
> >>Cc: dev@dpdk.org; Nithin Kumar Dabilpuram
> <ndabilpu...@marvell.com>;
> >>Vamsi Krishna Attunuru <vattun...@marvell.com>; Pavan Nikhilesh
> >>Bhagavatula <pbhagavat...@marvell.com>; Olivier Matz
> >><olivier.m...@6wind.com>
> >>Subject: [EXT] Re: [dpdk-dev] [PATCH v3 25/27] mempool/octeontx2:
> >>add optimized dequeue operation for arm64
> >>
> >>> From: Pavan Nikhilesh <pbhagavat...@marvell.com>
> >>>
> >>> This patch adds an optimized arm64 instruction based routine to
> >>leverage
> >>> CPU pipeline characteristics of octeontx2. The theme is to fill the
> >>> pipeline with CASP operations as much HW can do so that HW can do
> >>alloc()
> >>> HW ops in full throttle.
> >>>
> >>> Cc: Olivier Matz <olivier.m...@6wind.com>
> >>> Cc: Aaron Conole <acon...@redhat.com>
> >>>
> >>> Signed-off-by: Pavan Nikhilesh <pbhagavat...@marvell.com>
> >>> Signed-off-by: Jerin Jacob <jer...@marvell.com>
> >>> Signed-off-by: Vamsi Attunuru <vattun...@marvell.com>
> >>> ---
> >>>  drivers/mempool/octeontx2/otx2_mempool_ops.c | 291
> >>+++++++++++++++++++
> >>>  1 file changed, 291 insertions(+)
> >>>
> >>> diff --git a/drivers/mempool/octeontx2/otx2_mempool_ops.c
> >>b/drivers/mempool/octeontx2/otx2_mempool_ops.c
> >>> index c59bd73c0..e6737abda 100644
> >>> --- a/drivers/mempool/octeontx2/otx2_mempool_ops.c
> >>> +++ b/drivers/mempool/octeontx2/otx2_mempool_ops.c
> >>> @@ -37,6 +37,293 @@ npa_lf_aura_op_alloc_one(const int64_t
> >>wdata, int64_t * const addr,
> >>>   return -ENOENT;
> >>>  }
> >>>
> >>> +#if defined(RTE_ARCH_ARM64)
> >>> +static __rte_noinline int
> >>> +npa_lf_aura_op_search_alloc(const int64_t wdata, int64_t * const
> >>addr,
> >>> +         void **obj_table, unsigned int n) {
> >>> + uint8_t i;
> >>> +
> >>> + for (i = 0; i < n; i++) {
> >>> +         if (obj_table[i] != NULL)
> >>> +                 continue;
> >>> +         if (npa_lf_aura_op_alloc_one(wdata, addr, obj_table,
> >>i))
> >>> +                 return -ENOENT;
> >>> + }
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static  __attribute__((optimize("-O3"))) __rte_noinline int __hot
> >>
> >>Sorry if I missed this before.
> >>
> >>Is there a good reason to hard-code this optimization, rather than let
> >>the build system provide it?
> >
> > Some versions of compiler don't have support for __int128_t for CASP
> inline-asm.
> > i.e. if the optimization level is reduced to -O0 the CASP restrictions
> > aren't followed and compiler might end up violation the CASP rules
> example:
> >
> > /tmp/ccSPMGzq.s:1648: Error: reg pair must start from even reg at
> > operand 1 - `casp x21,x22,x0,x1,[x19]'
> > /tmp/ccSPMGzq.s:1706: Error: reg pair must start from even reg at
> > operand 1 - `casp x13,x14,x0,x1,[x11]'
> > /tmp/ccSPMGzq.s:1745: Error: reg pair must start from even reg at
> > operand 1 - `casp x9,x10,x0,x1,[x7]'
> > /tmp/ccSPMGzq.s:1775: Error: reg pair must start from even reg at
> > operand 1 - `casp x7,x8,x0,x1,[x5]'*
> >
> > Forcing to -O3 with __rte_noinline in place fixes it as the alignment fits 
> > in.
> 
> It makes sense to document this - it isn't apparent that it is needed.
> It would be good to put a comment just before that explains it, preferably
> with the compilers that aren't behaving.  This would help in the future to
> determine when it would be safe to drop the flag.

Yes. Will add the comment.
Re: [dpdk-dev] [PATCH v3 25/27] mempool/octeontx2: add optimized dequeue operation for arm64

Reply via email to