The SSE(x86) Rx/Tx burst functions added in v17.08 would be ported for ARM NEON in v17.11. Although this is still ongoing effort (more implementation and further optimization), this intrim patch can be applied on top of v17.08 and forward packts.
One of topics to discuss is that I used inilne assembly for performance critical code blocks because I don't think intrinsics for NEON aren't well optimized yet, especially vqtbl2q_u8()/vqtbl3q_u8()/vqtbl4q_u8() and gcc's register optimization. And older gcc doesn't even have vld1q_u8_x4(). I used it to get rid of hotspots shown in profiling result. I'm not sure whether inline assembly is allowed in DPDK community. But, I believe there's no reason to prohibit it. In my patch, some of functions are commented out as I'm not done migrating those yet. But this is functional (Rx/Tx). For Tx, "--txqflags=0xf01" is needed because I haven't ported txq_scatter_v() yet. Yongseok Koh (1): net/mlx5: add vectorized Rx/Tx burst for ARM drivers/net/mlx5/Makefile | 2 + drivers/net/mlx5/mlx5_ethdev.c | 4 +- drivers/net/mlx5/mlx5_prm.h | 15 + drivers/net/mlx5/mlx5_rxq.c | 61 ++ drivers/net/mlx5/mlx5_rxtx.h | 3 +- drivers/net/mlx5/mlx5_rxtx_vec_neon.c | 1464 +++++++++++++++++++++++++++++++++ 6 files changed, 1546 insertions(+), 3 deletions(-) create mode 100644 drivers/net/mlx5/mlx5_rxtx_vec_neon.c -- 2.11.0