CC to the right e-mail address.

> > I also have some concerns on how useful this API will be in real life,
> > and does the use case worth the complexity it brings.
> > And it looks too much low level detail for the application.
> 
> Concerns of direct rearm:
> 1. Earlier version of the design required the rxq/txq pairing to be done
> before starting the data plane threads. This required the user to know the
> direction of the packet flow in advance. This limited the use cases.
> 
> In the latest version, direct-rearm mode is packaged as a separate API.
> This allows for the users to change rxq/txq pairing in real time in data 
> plane,
> according to the analysis of the packet flow by the application, for example:
> ----------------------------------------------------------------------------------------------
> --------------
> Step 1: upper application analyse the flow direction Step 2: rxq_rearm_data =
> rte_eth_rx_get_rearm_data(rx_portid, rx_queueid) Step 3:
> rte_eth_dev_direct_rearm(rx_portid, rx_queueid, tx_portid, tx_queueid,
> rxq_rearm_data); Step 4: rte_eth_rx_burst(rx_portid,rx_queueid);
> Step 5: rte_eth_tx_burst(tx_portid,tx_queueid);
> ----------------------------------------------------------------------------------------------
> --------------
> Above can support user to change rxq/txq pairing  at runtime and user does
> not need to know the direction of flow in advance. This can effectively
> expand direct-rearm use scenarios.
> 
> 2. Earlier version of direct rearm was breaking the independence between
> the RX and TX path.
> In the latest version, we use a structure to let Rx and Tx interact, for 
> example:
> ----------------------------------------------------------------------------------------------
> -------------------------------------
> struct rte_eth_rxq_rearm_data {
>        struct rte_mbuf **buf_ring; /**< Buffer ring of Rx queue. */
>        uint16_t *refill_head;            /**< Head of buffer ring refilling 
> descriptors.
> */
>        uint16_t *receive_tail;          /**< Tail of buffer ring receiving 
> pkts. */
>        uint16_t nb_buf;                    /**< configured number of buffer 
> ring. */
> }  rxq_rearm_data;
> 
> data path:
>       /* Get direct-rearm info for a receive queue of an Ethernet device.
> */
>       rxq_rearm_data = rte_eth_rx_get_rearm_data(rx_portid,
> rx_queueid);
>       rte_eth_dev_direct_rearm(rx_portid, rx_queueid, tx_portid,
> tx_queueid, rxq_rearm_data) {
> 
>               /*  Using Tx used buffer to refill Rx buffer ring in direct 
> rearm
> mode */
>               nb_rearm = rte_eth_tx_fill_sw_ring(tx_portid, tx_queueid,
> rxq_rearm_data );
> 
>               /* Flush Rx descriptor in direct rearm mode */
>               rte_eth_rx_flush_descs(rx_portid, rx_queuid, nb_rearm) ;
>       }
>       rte_eth_rx_burst(rx_portid,rx_queueid);
>       rte_eth_tx_burst(tx_portid,tx_queueid);
> ----------------------------------------------------------------------------------------------
> -------------------------------------
> Furthermore, this let direct-rearm usage no longer limited to the same pmd,
> it can support moving buffers between different vendor pmds, even can put
> the buffer anywhere into your Rx buffer ring as long as the address of the
> buffer ring can be provided.
> In the latest version, we enable direct-rearm in i40e pmd and ixgbe pmd, and
> also try to use i40e driver in Rx, ixgbe driver in Tx, and then achieve 7-9%
> performance improvement by direct-rearm.
> 
> 3. Difference between direct rearm, ZC API used in mempool  and general
> path For general path:
>                 Rx: 32 pkts memcpy from mempool cache to rx_sw_ring
>                 Tx: 32 pkts memcpy from tx_sw_ring to temporary variable + 32 
> pkts
> memcpy from temporary variable to mempool cache For ZC API used in
> mempool:
>                 Rx: 32 pkts memcpy from mempool cache to rx_sw_ring
>                 Tx: 32 pkts memcpy from tx_sw_ring to zero-copy mempool cache
>                 Refer link:
> http://patches.dpdk.org/project/dpdk/patch/20230221055205.22984-2-
> kamalakshitha.alig...@arm.com/
> For direct_rearm:
>                 Rx/Tx: 32 pkts memcpy from tx_sw_ring to rx_sw_ring Thus we 
> can
> see in the one loop, compared to general path direct rearm reduce 32+32=64
> pkts memcpy; Compared to ZC API used in mempool, we can see direct
> rearm reduce 32 pkts memcpy in each loop.
> So, direct_rearm has its own benefits.
> 
> 4. Performance test and real cases
> For performance test, in l3fwd, we achieve the performance improvement
> of up to 15% in Arm server.
> For real cases, we have enabled direct-rearm in vpp and achieved
> performance improvement.
> 

Reply via email to