> -----Original Message----- > From: Ferruh Yigit <ferruh.yi...@amd.com> > Sent: Wednesday, April 19, 2023 10:56 PM > To: Feifei Wang <feifei.wa...@arm.com>; Qi Z Zhang > <qi.z.zh...@intel.com>; Mcnamara, John <john.mcnam...@intel.com> > Cc: dev@dpdk.org; konstantin.v.anan...@yandex.ru; > m...@smartsharesystems.com; nd <n...@arm.com> > Subject: Re: [PATCH v5 0/3] Recycle buffers from Tx to Rx > > On 3/30/2023 7:29 AM, Feifei Wang wrote: > > Currently, the transmit side frees the buffers into the lcore cache > > and the receive side allocates buffers from the lcore cache. The > > transmit side typically frees 32 buffers resulting in 32*8=256B of > > stores to lcore cache. The receive side allocates 32 buffers and > > stores them in the receive side software ring, resulting in 32*8=256B > > of stores and 256B of load from the lcore cache. > > > > This patch proposes a mechanism to avoid freeing to/allocating from > > the lcore cache. i.e. the receive side will free the buffers from > > transmit side directly into its software ring. This will avoid the > > 256B of loads and stores introduced by the lcore cache. It also frees > > up the cache lines used by the lcore cache. And we can call this mode > > as buffer recycle mode. > > > > In the latest version, buffer recycle mode is packaged as a separate API. > > This allows for the users to change rxq/txq pairing in real time in > > data plane, according to the analysis of the packet flow by the application, > for example: > > ---------------------------------------------------------------------- > > - Step 1: upper application analyse the flow direction Step 2: > > rxq_buf_recycle_info = rte_eth_rx_buf_recycle_info_get(rx_portid, > > rx_queueid) Step 3: rte_eth_dev_buf_recycle(rx_portid, rx_queueid, > > tx_portid, tx_queueid, rxq_buf_recycle_info); Step 4: > > rte_eth_rx_burst(rx_portid,rx_queueid); > > Step 5: rte_eth_tx_burst(tx_portid,tx_queueid); > > ---------------------------------------------------------------------- > > - Above can support user to change rxq/txq pairing at runtime and > > user does not need to know the direction of flow in advance. This can > > effectively expand buffer recycle mode's use scenarios. > > > > Furthermore, buffer recycle mode is no longer limited to the same pmd, > > it can support moving buffers between different vendor pmds, even can > > put the buffer anywhere into your Rx buffer ring as long as the address of > > the > buffer ring can be provided. > > In the latest version, we enable direct-rearm in i40e pmd and ixgbe > > pmd, and also try to use i40e driver in Rx, ixgbe driver in Tx, and > > then achieve 7-9% performance improvement by buffer recycle mode. > > > > Difference between buffer recycle, ZC API used in mempool and general > > path For general path: > > Rx: 32 pkts memcpy from mempool cache to rx_sw_ring > > Tx: 32 pkts memcpy from tx_sw_ring to temporary > > variable + 32 pkts memcpy from temporary variable to mempool cache For > ZC API used in mempool: > > Rx: 32 pkts memcpy from mempool cache to rx_sw_ring > > Tx: 32 pkts memcpy from tx_sw_ring to zero-copy mempool > > cache > > Refer link: > > http://patches.dpdk.org/project/dpdk/patch/20230221055205.22984-2- > kama > > lakshitha.alig...@arm.com/ > > For buffer recycle: > > Rx/Tx: 32 pkts memcpy from tx_sw_ring to rx_sw_ring > > Thus we can see in the one loop, compared to general path, buffer > > recycle reduce 32+32=64 pkts memcpy; Compared to ZC API used in > mempool, we can see buffer recycle reduce 32 pkts memcpy in each loop. > > So, buffer recycle has its own benefits. > > > > Testing status: > > (1) dpdk l3fwd test with multiple drivers: > > port 0: 82599 NIC port 1: XL710 NIC > > ------------------------------------------------------------- > > Without fast free With fast free > > Thunderx2: +7.53% +13.54% > > ------------------------------------------------------------- > > > > (2) dpdk l3fwd test with same driver: > > port 0 && 1: XL710 NIC > > ------------------------------------------------------------- > > Without fast free With fast free > > Ampere altra: +12.61% +11.42% > > n1sdp: +8.30% +3.85% > > x86-sse: +8.43% +3.72% > > ------------------------------------------------------------- > > > > (3) Performance comparison with ZC_mempool used > > port 0 && 1: XL710 NIC > > with fast free > > ------------------------------------------------------------- > > With recycle buffer With zc_mempool > > Ampere altra: 11.42% 3.54% > > ------------------------------------------------------------- > > > > Thanks for the perf test reports. > > Since test is done on Intel NICs, it would be great to get some testing and > performance numbers from Intel side too, if possible.
Thanks for the reviewing. Actually, we have done the test in x86. From the performance number above, It shows in x86-sse path, buffer recycle can improve performance by 3.72% ~ 8.43%. > > > V2: > > 1. Use data-plane API to enable direct-rearm (Konstantin, Honnappa) 2. > > Add 'txq_data_get' API to get txq info for Rx (Konstantin) 3. Use > > input parameter to enable direct rearm in l3fwd (Konstantin) 4. Add > > condition detection for direct rearm API (Morten, Andrew Rybchenko) > > > > V3: > > 1. Seperate Rx and Tx operation with two APIs in direct-rearm > > (Konstantin) 2. Delete L3fwd change for direct rearm (Jerin) 3. enable > > direct rearm in ixgbe driver in Arm > > > > v4: > > 1. Rename direct-rearm as buffer recycle. Based on this, function name > > and variable name are changed to let this mode more general for all > > drivers. (Konstantin, Morten) 2. Add ring wrapping check (Konstantin) > > > > v5: > > 1. some change for ethdev API (Morten) 2. add support for avx2, sse, > > altivec path > > > > Feifei Wang (3): > > ethdev: add API for buffer recycle mode > > net/i40e: implement recycle buffer mode > > net/ixgbe: implement recycle buffer mode > > > > drivers/net/i40e/i40e_ethdev.c | 1 + > > drivers/net/i40e/i40e_ethdev.h | 2 + > > drivers/net/i40e/i40e_rxtx.c | 159 +++++++++++++++++++++ > > drivers/net/i40e/i40e_rxtx.h | 4 + > > drivers/net/ixgbe/ixgbe_ethdev.c | 1 + > > drivers/net/ixgbe/ixgbe_ethdev.h | 3 + > > drivers/net/ixgbe/ixgbe_rxtx.c | 153 ++++++++++++++++++++ > > drivers/net/ixgbe/ixgbe_rxtx.h | 4 + > > lib/ethdev/ethdev_driver.h | 10 ++ > > lib/ethdev/ethdev_private.c | 2 + > > lib/ethdev/rte_ethdev.c | 33 +++++ > > lib/ethdev/rte_ethdev.h | 230 > +++++++++++++++++++++++++++++++ > > lib/ethdev/rte_ethdev_core.h | 15 +- > > lib/ethdev/version.map | 6 + > > 14 files changed, 621 insertions(+), 2 deletions(-) > > > > Is usage sample of these new APIs planned? Can it be a new forwarding mode > in testpmd? Agree. Following the discussion in Tech Board meeting, we will add buffer recycle into testpmd fwd engine.