This is to introduce more efficient Rx/Tx burst functions using SIMD instructions. Currently it is only supported by 64bit x86 having SSE4.1.
>From functional perspective, Rx burst function is equivalent to the existing mlx5_rx_burst() except for scatter support, which will be added soon. Tx burst function supports multi-segment packets and offload flags unless it is disabled by txq_flags. However, disabling those features will bring a little higher performance. v3: * Remove requirement of SSE4.1 as DPDK now mandates SSE4.2 support at least. * Bug fix in "net/mlx5: select Rx/Tx callbacks when starting device" - Need to re-select Rx burst func when chaning MTU size. * Resolved an optimization issue of gcc-6 in rxq_burst_v() - Bit shift (<<) for 128b vector type is compiled differently. 'psllq' is needed instead of 'sal'. * Minor changes to address what is mentioned by review. - Remove 'pragma' for PEDANTIC - Make mlx5_ptype_table global. - Change name of some inline funcs which also exist in mlx4 by the same name. - Fix comments and indentation/spacing. v2: * Streamline redundant conditional clauses in txq_complete(). * Remove the mempool pointer in txq->mp2mr structure. * Fix indentation and spacing. Yongseok Koh (5): net/mlx5: change indexing for Tx SW ring net/mlx5: free buffers in bulk on Tx completion net/mlx5: use buffer address for LKEY search net/mlx5: select Rx/Tx callbacks when starting device net/mlx5: add vectorized Rx/Tx burst for SSE4.1 drivers/net/mlx5/Makefile | 3 + drivers/net/mlx5/mlx5_defs.h | 18 + drivers/net/mlx5/mlx5_ethdev.c | 47 +- drivers/net/mlx5/mlx5_mr.c | 17 +- drivers/net/mlx5/mlx5_rxq.c | 57 +- drivers/net/mlx5/mlx5_rxtx.c | 459 ++++------- drivers/net/mlx5/mlx5_rxtx.h | 290 ++++++- drivers/net/mlx5/mlx5_rxtx_vec_sse.c | 1378 ++++++++++++++++++++++++++++++++++ drivers/net/mlx5/mlx5_trigger.c | 3 + drivers/net/mlx5/mlx5_txq.c | 23 +- 10 files changed, 1927 insertions(+), 368 deletions(-) create mode 100644 drivers/net/mlx5/mlx5_rxtx_vec_sse.c -- 2.11.0