Hi, Ferruh Would you please give some comments on these patches? If no comments, would mbufs recycle mode be merged in dpdk-next branch? Thanks very much.
Best Regards Feifei > -----Original Message----- > From: Feifei Wang <feifei.wa...@arm.com> > Sent: Tuesday, August 22, 2023 3:27 PM > Cc: dev@dpdk.org; nd <n...@arm.com>; Feifei Wang > <feifei.wa...@arm.com> > Subject: [PATCH v11 0/4] Recycle mbufs from Tx queue into Rx queue > > Currently, the transmit side frees the buffers into the lcore cache and the > receive side allocates buffers from the lcore cache. The transmit side > typically > frees 32 buffers resulting in 32*8=256B of stores to lcore cache. The receive > side allocates 32 buffers and stores them in the receive side software ring, > resulting in 32*8=256B of stores and 256B of load from the lcore cache. > > This patch proposes a mechanism to avoid freeing to/allocating from the lcore > cache. i.e. the receive side will free the buffers from transmit side > directly into > its software ring. This will avoid the 256B of loads and stores introduced by > the lcore cache. It also frees up the cache lines used by the lcore cache. > And we > can call this mode as mbufs recycle mode. > > In the latest version, mbufs recycle mode is packaged as a separate API. > This allows for the users to change rxq/txq pairing in real time in data > plane, > according to the analysis of the packet flow by the application, for example: > ----------------------------------------------------------------------- > Step 1: upper application analyse the flow direction Step 2: recycle_rxq_info > = > rte_eth_recycle_rx_queue_info_get(rx_portid, rx_queueid) Step 3: > rte_eth_recycle_mbufs(rx_portid, rx_queueid, tx_portid, tx_queueid, > recycle_rxq_info); Step 4: rte_eth_rx_burst(rx_portid,rx_queueid); > Step 5: rte_eth_tx_burst(tx_portid,tx_queueid); > ----------------------------------------------------------------------- > Above can support user to change rxq/txq pairing at run-time and user does > not need to know the direction of flow in advance. This can effectively expand > mbufs recycle mode's use scenarios. > > Furthermore, mbufs recycle mode is no longer limited to the same pmd, it can > support moving mbufs between different vendor pmds, even can put the > mbufs anywhere into your Rx mbuf ring as long as the address of the mbuf > ring can be provided. > In the latest version, we enable mbufs recycle mode in i40e pmd and ixgbe > pmd, and also try to use i40e driver in Rx, ixgbe driver in Tx, and then > achieve > 7-9% performance improvement by mbufs recycle mode. > > Difference between mbuf recycle, ZC API used in mempool and general path > For general path: > Rx: 32 pkts memcpy from mempool cache to rx_sw_ring > Tx: 32 pkts memcpy from tx_sw_ring to temporary variable + 32 > pkts > memcpy from temporary variable to mempool cache For ZC API used in > mempool: > Rx: 32 pkts memcpy from mempool cache to rx_sw_ring > Tx: 32 pkts memcpy from tx_sw_ring to zero-copy mempool cache > Refer link: > http://patches.dpdk.org/project/dpdk/patch/20230221055205.22984-2- > kamalakshitha.alig...@arm.com/ > For mbufs recycle: > Rx/Tx: 32 pkts memcpy from tx_sw_ring to rx_sw_ring Thus we > can > see in the one loop, compared to general path, mbufs recycle mode reduces > 32+32=64 pkts memcpy; Compared to ZC API used in mempool, we can see > mbufs recycle mode reduce 32 pkts memcpy in each loop. > So, mbufs recycle has its own benefits. > > Testing status: > (1) dpdk l3fwd test with multiple drivers: > port 0: 82599 NIC port 1: XL710 NIC > ------------------------------------------------------------- > Without fast free With fast free > Thunderx2: +7.53% +13.54% > ------------------------------------------------------------- > > (2) dpdk l3fwd test with same driver: > port 0 && 1: XL710 NIC > ------------------------------------------------------------- > Without fast free With fast free > Ampere altra: +12.61% +11.42% > n1sdp: +8.30% +3.85% > x86-sse: +8.43% +3.72% > ------------------------------------------------------------- > > (3) Performance comparison with ZC_mempool used > port 0 && 1: XL710 NIC > with fast free > ------------------------------------------------------------- > With recycle buffer With zc_mempool > Ampere altra: 11.42% 3.54% > ------------------------------------------------------------- > > Furthermore, we add recycle_mbuf engine in testpmd. Due to XL710 NIC has > I/O bottleneck in testpmd in ampere altra, we can not see throughput change > compared with I/O fwd engine. However, using record cmd in testpmd: > '$set record-burst-stats on' > we can see the ratio of 'Rx/Tx burst size of 32' is reduced. This indicate > mbufs > recycle can save CPU cycles. > > V2: > 1. Use data-plane API to enable direct-rearm (Konstantin, Honnappa) 2. Add > 'txq_data_get' API to get txq info for Rx (Konstantin) 3. Use input parameter > to > enable direct rearm in l3fwd (Konstantin) 4. Add condition detection for > direct > rearm API (Morten, Andrew Rybchenko) > > V3: > 1. Seperate Rx and Tx operation with two APIs in direct-rearm (Konstantin) 2. > Delete L3fwd change for direct rearm (Jerin) 3. enable direct rearm in ixgbe > driver in Arm > > v4: > 1. Rename direct-rearm as buffer recycle. Based on this, function name and > variable name are changed to let this mode more general for all drivers. > (Konstantin, Morten) 2. Add ring wrapping check (Konstantin) > > v5: > 1. some change for ethdev API (Morten) > 2. add support for avx2, sse, altivec path > > v6: > 1. fix ixgbe build issue in ppc > 2. remove 'recycle_tx_mbufs_reuse' and 'recycle_rx_descriptors_refill' > API wrapper (Tech Board meeting) > 3. add recycle_mbufs engine in testpmd (Tech Board meeting) 4. add > namespace in the functions related to mbufs recycle(Ferruh) > > v7: > 1. move 'rxq/txq data' pointers to the beginning of eth_dev structure, in > order > to keep them in the same cache line as rx/tx_burst function pointers (Morten) > 2. add the extra description for 'rte_eth_recycle_mbufs' to show it can > support > feeding 1 Rx queue from 2 Tx queues in the same thread > (Konstantin) > 3. For i40e/ixgbe driver, make the previous copied buffers as invalid if > there are > Tx buffers refcnt > 1 or from unexpected mempool (Konstantin) 4. add check > for the return value of 'rte_eth_recycle_rx_queue_info_get' > in testpmd fwd engine (Morten) > > v8: > 1. add arm/x86 build option to fix ixgbe build issue in ppc > > v9: > 1. delete duplicate file name for ixgbe > > v10: > 1. fix compile issue on windows > > v11: > 1. fix doc warning > > Feifei Wang (4): > ethdev: add API for mbufs recycle mode > net/i40e: implement mbufs recycle mode > net/ixgbe: implement mbufs recycle mode > app/testpmd: add recycle mbufs engine > > app/test-pmd/meson.build | 1 + > app/test-pmd/recycle_mbufs.c | 58 ++++++ > app/test-pmd/testpmd.c | 1 + > app/test-pmd/testpmd.h | 3 + > doc/guides/rel_notes/release_23_11.rst | 15 ++ > doc/guides/testpmd_app_ug/run_app.rst | 1 + > doc/guides/testpmd_app_ug/testpmd_funcs.rst | 5 +- > drivers/net/i40e/i40e_ethdev.c | 1 + > drivers/net/i40e/i40e_ethdev.h | 2 + > .../net/i40e/i40e_recycle_mbufs_vec_common.c | 147 ++++++++++++++ > drivers/net/i40e/i40e_rxtx.c | 32 ++++ > drivers/net/i40e/i40e_rxtx.h | 4 + > drivers/net/i40e/meson.build | 1 + > drivers/net/ixgbe/ixgbe_ethdev.c | 1 + > drivers/net/ixgbe/ixgbe_ethdev.h | 3 + > .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 143 ++++++++++++++ > drivers/net/ixgbe/ixgbe_rxtx.c | 37 +++- > drivers/net/ixgbe/ixgbe_rxtx.h | 4 + > drivers/net/ixgbe/meson.build | 2 + > lib/ethdev/ethdev_driver.h | 10 + > lib/ethdev/ethdev_private.c | 2 + > lib/ethdev/rte_ethdev.c | 31 +++ > lib/ethdev/rte_ethdev.h | 181 ++++++++++++++++++ > lib/ethdev/rte_ethdev_core.h | 23 ++- > lib/ethdev/version.map | 3 + > 25 files changed, 702 insertions(+), 9 deletions(-) create mode 100644 > app/test-pmd/recycle_mbufs.c create mode 100644 > drivers/net/i40e/i40e_recycle_mbufs_vec_common.c > create mode 100644 drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c > > -- > 2.25.1