> -----邮件原件----- > 发件人: Feifei Wang <feifei.wa...@arm.com> > 发送时间: Tuesday, September 27, 2022 10:48 AM > 抄送: dev@dpdk.org; nd <n...@arm.com>; Feifei Wang > <feifei.wa...@arm.com> > 主题: [PATCH v2 0/3] Direct re-arming of buffers on receive side > > Currently, the transmit side frees the buffers into the lcore cache and the > receive side allocates buffers from the lcore cache. The transmit side > typically > frees 32 buffers resulting in 32*8=256B of stores to lcore cache. The receive > side allocates 32 buffers and stores them in the receive side software ring, > resulting in 32*8=256B of stores and 256B of load from the lcore cache. > > This patch proposes a mechanism to avoid freeing to/allocating from the > lcore cache. i.e. the receive side will free the buffers from transmit side > directly into it's software ring. This will avoid the 256B of loads and stores > introduced by the lcore cache. It also frees up the cache lines used by the > lcore cache. > > However, this solution poses several constraints: > > 1)The receive queue needs to know which transmit queue it should take the > buffers from. The application logic decides which transmit port to use to send > out the packets. In many use cases the NIC might have a single port ([1], [2], > [3]), in which case a given transmit queue is always mapped to a single > receive queue (1:1 Rx queue: Tx queue). This is easy to configure. > > If the NIC has 2 ports (there are several references), then we will have > 1:2 (RX queue: TX queue) mapping which is still easy to configure. > However, if this is generalized to 'N' ports, the configuration can be long. > More over the PMD would have to scan a list of transmit queues to pull the > buffers from. > > 2)The other factor that needs to be considered is 'run-to-completion' vs > 'pipeline' models. In the run-to-completion model, the receive side and the > transmit side are running on the same lcore serially. In the pipeline model. > The receive side and transmit side might be running on different lcores in > parallel. This requires locking. This is not supported at this point. > > 3)Tx and Rx buffers must be from the same mempool. And we also must > ensure Tx buffer free number is equal to Rx buffer free number: > (txq->tx_rs_thresh == RTE_I40E_RXQ_REARM_THRESH) Thus, 'tx_next_dd' > can be updated correctly in direct-rearm mode. This is due to tx_next_dd is a > variable to compute tx sw-ring free location. > Its value will be one more round than the position where next time free > starts. > > Current status in this patch: > 1)Two APIs are added for users to enable direct-rearm mode: > In control plane, users can call 'rte_eth_txq_data_get' to get Tx sw_ring > pointer and its txq_info (This avoid Rx load Tx data directly); > > In data plane, users can call 'rte_eth_rx_direct_rearm' to rearm Rx > buffers and free Tx buffers at the same time (Currently it supports 1:1 > (rxq:txq) mapping:) > ----------------------------------------------------------------------- > control plane: > rte_eth_txq_data_get(*txq_data); > data plane: > loop { > rte_eth_rx_direct_rearm(*txq_data){ > for (i = 0; i <= 32; i++) { > rx.mbuf[i] = tx.mbuf[i]; > initialize descs[i]; > } > } > rte_eth_rx_burst; > rte_eth_tx_burst; > } > ----------------------------------------------------------------------- > 2)The i40e driver is changed to do the direct re-arm of the receive > side. > 3)L3fwd application is modified to enable direct rearm mode. Users can > enable direct-rearm and map queues by input parameters. > > Testing status: > 1.The testing results for L3fwd are as follows: > ------------------------------------------------------------------- > enabled direct rearm > ------------------------------------------------------------------- > Arm: > N1SDP(neon path): > without fast-free mode with fast-free mode > +15.09% +4.2% > > Ampere Altra(neon path): > without fast-free mode with fast-free mode > +10.9% +14.6% > ------------------------------------------------------------------- > > 2.The testing results for VPP-L3fwd are as follows: > ------------------------------------------------------------------- > Arm: > N1SDP(neon path): > with direct re-arm mode enabled > +4.5% > > Ampere Altra(neon path): > with direct re-arm mode enabled > +6.5% > ------------------------------------------------------------------- > > Reference: > [1] https://store.nvidia.com/en- > us/networking/store/product/MCX623105AN- > CDAT/NVIDIAMCX623105ANCDATConnectX6DxENAdapterCard100GbECrypt > oDisabled/ > [2] https://www.intel.com/content/www/us/en/products/sku/192561/intel- > ethernet-network-adapter-e810cqda1/specifications.html > [3] https://www.broadcom.com/products/ethernet-connectivity/network- > adapters/100gb-nic-ocp/n1100g > > V2: > 1. Use data-plane API to enable direct-rearm (Konstantin, Honnappa) 2. Add > 'txq_data_get' API to get txq info for Rx (Konstantin) 3. Use input parameter > to enable direct rearm in l3fwd (Konstantin) 4. Add condition detection for > direct rearm API (Morten, Andrew Rybchenko) > PING
Hi, Would you please give some comments for this version? Thanks very much. Best Regards Feifei