CC to the right e-mail address.
> > I also have some concerns on how useful this API will be in real life, > > and does the use case worth the complexity it brings. > > And it looks too much low level detail for the application. > > Concerns of direct rearm: > 1. Earlier version of the design required the rxq/txq pairing to be done > before starting the data plane threads. This required the user to know the > direction of the packet flow in advance. This limited the use cases. > > In the latest version, direct-rearm mode is packaged as a separate API. > This allows for the users to change rxq/txq pairing in real time in data > plane, > according to the analysis of the packet flow by the application, for example: > ---------------------------------------------------------------------------------------------- > -------------- > Step 1: upper application analyse the flow direction Step 2: rxq_rearm_data = > rte_eth_rx_get_rearm_data(rx_portid, rx_queueid) Step 3: > rte_eth_dev_direct_rearm(rx_portid, rx_queueid, tx_portid, tx_queueid, > rxq_rearm_data); Step 4: rte_eth_rx_burst(rx_portid,rx_queueid); > Step 5: rte_eth_tx_burst(tx_portid,tx_queueid); > ---------------------------------------------------------------------------------------------- > -------------- > Above can support user to change rxq/txq pairing at runtime and user does > not need to know the direction of flow in advance. This can effectively > expand direct-rearm use scenarios. > > 2. Earlier version of direct rearm was breaking the independence between > the RX and TX path. > In the latest version, we use a structure to let Rx and Tx interact, for > example: > ---------------------------------------------------------------------------------------------- > ------------------------------------- > struct rte_eth_rxq_rearm_data { > struct rte_mbuf **buf_ring; /**< Buffer ring of Rx queue. */ > uint16_t *refill_head; /**< Head of buffer ring refilling > descriptors. > */ > uint16_t *receive_tail; /**< Tail of buffer ring receiving > pkts. */ > uint16_t nb_buf; /**< configured number of buffer > ring. */ > } rxq_rearm_data; > > data path: > /* Get direct-rearm info for a receive queue of an Ethernet device. > */ > rxq_rearm_data = rte_eth_rx_get_rearm_data(rx_portid, > rx_queueid); > rte_eth_dev_direct_rearm(rx_portid, rx_queueid, tx_portid, > tx_queueid, rxq_rearm_data) { > > /* Using Tx used buffer to refill Rx buffer ring in direct > rearm > mode */ > nb_rearm = rte_eth_tx_fill_sw_ring(tx_portid, tx_queueid, > rxq_rearm_data ); > > /* Flush Rx descriptor in direct rearm mode */ > rte_eth_rx_flush_descs(rx_portid, rx_queuid, nb_rearm) ; > } > rte_eth_rx_burst(rx_portid,rx_queueid); > rte_eth_tx_burst(tx_portid,tx_queueid); > ---------------------------------------------------------------------------------------------- > ------------------------------------- > Furthermore, this let direct-rearm usage no longer limited to the same pmd, > it can support moving buffers between different vendor pmds, even can put > the buffer anywhere into your Rx buffer ring as long as the address of the > buffer ring can be provided. > In the latest version, we enable direct-rearm in i40e pmd and ixgbe pmd, and > also try to use i40e driver in Rx, ixgbe driver in Tx, and then achieve 7-9% > performance improvement by direct-rearm. > > 3. Difference between direct rearm, ZC API used in mempool and general > path For general path: > Rx: 32 pkts memcpy from mempool cache to rx_sw_ring > Tx: 32 pkts memcpy from tx_sw_ring to temporary variable + 32 > pkts > memcpy from temporary variable to mempool cache For ZC API used in > mempool: > Rx: 32 pkts memcpy from mempool cache to rx_sw_ring > Tx: 32 pkts memcpy from tx_sw_ring to zero-copy mempool cache > Refer link: > http://patches.dpdk.org/project/dpdk/patch/20230221055205.22984-2- > kamalakshitha.alig...@arm.com/ > For direct_rearm: > Rx/Tx: 32 pkts memcpy from tx_sw_ring to rx_sw_ring Thus we > can > see in the one loop, compared to general path direct rearm reduce 32+32=64 > pkts memcpy; Compared to ZC API used in mempool, we can see direct > rearm reduce 32 pkts memcpy in each loop. > So, direct_rearm has its own benefits. > > 4. Performance test and real cases > For performance test, in l3fwd, we achieve the performance improvement > of up to 15% in Arm server. > For real cases, we have enabled direct-rearm in vpp and achieved > performance improvement. >