On Wed, Feb 7, 2024 11:13 AM, ferruh.yi...@amd.com wrote: > On 2/1/2024 3:00 AM, Jiawen Wu wrote: > > To optimize Rx/Tx burst process, add SSE/NEON vector instructions on > > x86/arm architecture. > > > > Do you have any performance improvement number with vector > implementation, if so can you put it into commit log for record?
On our local x86 platforms, the performance was at full speed without using vector. So we don't have the performance improvement number with SSE yet. But I will add the test result for arm. > > @@ -2198,8 +2220,15 @@ txgbe_set_tx_function(struct rte_eth_dev *dev, > > struct txgbe_tx_queue *txq) > > #endif > > txq->tx_free_thresh >= RTE_PMD_TXGBE_TX_MAX_BURST) { > > PMD_INIT_LOG(DEBUG, "Using simple tx code path"); > > - dev->tx_pkt_burst = txgbe_xmit_pkts_simple; > > dev->tx_pkt_prepare = NULL; > > + if (txq->tx_free_thresh <= RTE_TXGBE_TX_MAX_FREE_BUF_SZ && > > + (rte_eal_process_type() != RTE_PROC_PRIMARY || > > > > Why vector Tx enable only for secondary process? It is not only for secondary process. The constraint is (rte_eal_process_type() != RTE_PROC_PRIMARY || txgbe_txq_vec_setup(txq) == 0) This code references ixgbe, which explains: "When using multiple processes, the TX function used in all processes should be the same, otherwise the secondary processes cannot transmit more than tx-ring-size - 1 packets. To achieve this, we extract out the code to select the ixgbe TX function to be used into a separate function inside the ixgbe driver, and call that from a secondary process when it is attaching to an already-configured NIC." > > +++ b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c > > @@ -0,0 +1,604 @@ > > +/* SPDX-License-Identifier: BSD-3-Clause > > + * Copyright(c) 2015-2024 Beijing WangXun Technology Co., Ltd. > > + * Copyright(c) 2010-2015 Intel Corporation > > + */ > > + > > +#include <ethdev_driver.h> > > +#include <rte_malloc.h> > > +#include <rte_vect.h> > > + > > +#include "txgbe_ethdev.h" > > +#include "txgbe_rxtx.h" > > +#include "txgbe_rxtx_vec_common.h" > > + > > +#pragma GCC diagnostic ignored "-Wcast-qual" > > + > > Is this pragma really required? Yes. Otherwise, there are warnings in the compilation: [1909/2921] Compiling C object drivers/libtmp_rte_net_txgbe.a.p/net_txgbe_txgbe_rxtx_vec_neon.c.o ../drivers/net/txgbe/txgbe_rxtx_vec_neon.c: In function ‘txgbe_rxq_rearm’: ../drivers/net/txgbe/txgbe_rxtx_vec_neon.c:37:15: warning: cast discards ‘volatile’ qualifier from pointer target type [-Wcast-qual] vst1q_u64((uint64_t *)&rxdp[i], zero); ^ ../drivers/net/txgbe/txgbe_rxtx_vec_neon.c:60:13: warning: cast discards ‘volatile’ qualifier from pointer target type [-Wcast-qual] vst1q_u64((uint64_t *)rxdp++, dma_addr0); ^ ../drivers/net/txgbe/txgbe_rxtx_vec_neon.c:65:13: warning: cast discards ‘volatile’ qualifier from pointer target type [-Wcast-qual] vst1q_u64((uint64_t *)rxdp++, dma_addr1);