Hi Jerin, > -----Original Message----- > From: Jerin Jacob <jerinjac...@gmail.com> > Sent: Friday, March 6, 2020 3:45 PM > To: Gavin Hu <gavin...@arm.com> > Cc: dpdk-dev <dev@dpdk.org>; nd <n...@arm.com>; David Marchand > <david.march...@redhat.com>; tho...@monjalon.net; > jer...@marvell.com; Ye, Xiaolong <xiaolong...@intel.com>; Honnappa > Nagarahalli <honnappa.nagaraha...@arm.com>; Ruifeng Wang > <ruifeng.w...@arm.com>; Phil Yang <phil.y...@arm.com>; Joyce Kong > <joyce.k...@arm.com>; Steve Capper <steve.cap...@arm.com> > Subject: Re: [dpdk-dev] [PATCH v1 3/3] net/i40e: auto-vectorization to > speed up Tx free > > On Fri, Mar 6, 2020 at 10:35 AM Gavin Hu <gavin...@arm.com> wrote: > > > > Tx mbuf free is a hotspot for i40e on aarch64, as there are no > > inter-loop dependencies, it is safe to enable auto-vectorization > > to speed up. > > > > This patch showed 2~3% performance lift on ThunderX2 and no > degradation > > on Arm N1SDP. The test case is single core RFC2544 zero-loss test. > > > > Signed-off-by: Gavin Hu <gavin...@arm.com> > > Reviewed-by: Steve Capper <steve.cap...@arm.com> > > --- > > drivers/net/i40e/i40e_rxtx_vec_common.h | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h > b/drivers/net/i40e/i40e_rxtx_vec_common.h > > index 0e6ffa007..fc0fa45d4 100644 > > --- a/drivers/net/i40e/i40e_rxtx_vec_common.h > > +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h > > @@ -98,6 +98,11 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) > > if (likely(m != NULL)) { > > free[0] = m; > > nb_free = 1; > > +#if defined(__clang__) > > +#pragma clang loop vectorize(assume_safety) > > +#elif defined(__GNUC__) > > +#pragma GCC ivdep > > +#endif > > IMO, It is better to abstract the compiler features (above compiler > feature and __restrict__) as macros in > rte_common.h or so. It will help to support other compilers(ICC or > Windows) and enable them to have "changes" in one place.
How about defining RTE_LOOP_AUTO_VECTORIZATION in the rte_common.h? #if defined(__clang__) define RTE_LOOP_AUTO_VECTORIZATION \ #pragma clang loop vectorize(assume_safety) #elif defined(__GNUC__) define RTE_LOOP_AUTO_VECTORIZATION \ #pragma GCC ivdep #else define RTE_LOOP_AUTO_VECTORIZATION #endif If you agree, I will submit a v2. Thanks for your comments! /Gavin > > > > > for (i = 1; i < n; i++) { > > m = rte_pktmbuf_prefree_seg(txep[i].mbuf); > > if (likely(m != NULL)) { > > -- > > 2.17.1 > >