On Fri, Mar 6, 2020 at 10:35 AM Gavin Hu <gavin...@arm.com> wrote: > > Tx mbuf free is a hotspot for i40e on aarch64, as there are no > inter-loop dependencies, it is safe to enable auto-vectorization > to speed up. > > This patch showed 2~3% performance lift on ThunderX2 and no degradation > on Arm N1SDP. The test case is single core RFC2544 zero-loss test. > > Signed-off-by: Gavin Hu <gavin...@arm.com> > Reviewed-by: Steve Capper <steve.cap...@arm.com> > --- > drivers/net/i40e/i40e_rxtx_vec_common.h | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h > b/drivers/net/i40e/i40e_rxtx_vec_common.h > index 0e6ffa007..fc0fa45d4 100644 > --- a/drivers/net/i40e/i40e_rxtx_vec_common.h > +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h > @@ -98,6 +98,11 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) > if (likely(m != NULL)) { > free[0] = m; > nb_free = 1; > +#if defined(__clang__) > +#pragma clang loop vectorize(assume_safety) > +#elif defined(__GNUC__) > +#pragma GCC ivdep > +#endif
IMO, It is better to abstract the compiler features (above compiler feature and __restrict__) as macros in rte_common.h or so. It will help to support other compilers(ICC or Windows) and enable them to have "changes" in one place. > for (i = 1; i < n; i++) { > m = rte_pktmbuf_prefree_seg(txep[i].mbuf); > if (likely(m != NULL)) { > -- > 2.17.1 >