> > On 9/16/22 22:35, fengchengwen wrote: > > Hi Chas, > > > > On 2022/9/15 0:59, Chas Williams wrote: > >> On 9/13/22 20:46, fengchengwen wrote: > >>> > >>> The main problem is hard to design a tx_prepare for bonding device: > >>> 1. as Chas Williams said, there maybe twice hash calc to get target slave > >>> devices. > >>> 2. also more important, if the slave devices have changes(e.g. slave > >>> device > >>> link down or remove), and if the changes happens between > >>> bond-tx-prepare and > >>> bond-tx-burst, the output slave will changes, and this may lead to > >>> checksum > >>> failed. (Note: a bond device with slave devices may from different > >>> vendors, > >>> and slave devices may have different requirements, e.g. slave-A > >>> support calc > >>> IPv4 pseudo-head automatic (no need driver pre-calc), but slave-B > >>> need driver > >>> pre-calc). > >>> > >>> Current design cover the above two scenarios by using in-place > >>> tx-prepare. and > >>> in addition, bond devices are not transparent to applications, I think > >>> it's a > >>> practical method to provide tx-prepare support in this way. > >>> > >> > >> > >> I don't think you need to export an enable/disable routine for the use of > >> rte_eth_tx_prepare. It's safe to just call that routine, even if it isn't > >> implemented. You are just trading one branch in DPDK librte_eth_dev for a > >> branch in drivers/net/bonding. > > > > Our first patch was just like yours (just add tx-prepare default), but > > community > > is concerned about impacting performance. > > > > As a trade-off, I think we can add the enable/disable API. > > IMHO, that's a bad idea. If the rte_eth_dev_tx_prepare API affects > performance adversly, that is not a bonding problem. All applications > should be calling rte_eth_dev_tx_prepare. There's no defined API > to determine if rte_eth_dev_tx_prepare should be called. Therefore, > applications should always call rte_eth_dev_tx_prepare. Regardless, > as I previously mentioned, you are just trading the location of > the branch, especially in the bonding case. > > If rte_eth_dev_tx_prepare is causing a performance drop, then that API > should be improved or rewritten. There are PMD that require you to use > that API. Locally, we had maintained a patch to eliminate the use of > rte_eth_dev_tx_prepare. However, that has been getting harder and harder > to maintain. The performance lost by just calling rte_eth_dev_tx_prepare > was marginal. > > > > >> > >> I think you missed fixing tx_machine in 802.3ad support. We have been using > >> the following patch locally which I never got around to submitting. > > > > You are right, I will send V3 fix it. > > > >> > >> > >> From a458654d68ff5144266807ef136ac3dd2adfcd98 Mon Sep 17 00:00:00 2001 > >> From: "Charles (Chas) Williams" <chwil...@ciena.com> > >> Date: Tue, 3 May 2022 16:52:37 -0400 > >> Subject: [PATCH] net/bonding: call rte_eth_tx_prepare before > >> rte_eth_tx_burst > >> > >> Some PMDs might require a call to rte_eth_tx_prepare before sending the > >> packets for transmission. Typically, the prepare step handles the VLAN > >> headers, but it may need to do other things. > >> > >> Signed-off-by: Chas Williams <chwil...@ciena.com> > > > > ... > > > >> * ring if transmission fails so the packet isn't lost. > >> @@ -1322,8 +1350,12 @@ bond_ethdev_tx_burst_broadcast(void *queue, struct > >> rte_mbuf **bufs, > >> > >> /* Transmit burst on each active slave */ > >> for (i = 0; i < num_of_slaves; i++) { > >> - slave_tx_total[i] = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id, > >> + uint16_t nb_prep; > >> + > >> + nb_prep = rte_eth_tx_prepare(slaves[i], bd_tx_q->queue_id, > >> bufs, nb_pkts); > >> + slave_tx_total[i] = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id, > >> + bufs, nb_prep); > > > > The tx-prepare may edit packet data, and the broadcast mode will send a > > packet to all slaves, > > the packet data is sent and edited at the same time. Is this likely to > > cause problems ? > > This routine is already broken. You can't just increment the refcount > and send the packet into a PMD's transmit routine. Nothing guarantees > that a transmit routine will not modify the packet. Many PMDs perform an > rte_vlan_insert.
Hmm interesting.... My uderstanding was quite opposite - tx_burst() can't modify packet data and metadata (except when refcnt==1 and tx_burst() going to free the mbuf and put it back to the mempool). While tx_prepare() can - actually as I remember that was one of the reasons why a separate routine was introduced. > You should at least perform a clone of the packet so > that the mbuf headers aren't mangled by each PMD. Just to be safe you > should perform a partial deep copy of the packet headers in case some > PMD does an rte_vlan_insert and the other PMDs in the bonding group do > not need an rte_vlan_insert. > > So doing a blind rte_eth_dev_tx_preprare isn't making anything much > worse. > > > > >> > >> if (unlikely(slave_tx_total[i] < nb_pkts)) > >> tx_failed_flag = 1;