> 
> On 9/16/22 22:35, fengchengwen wrote:
> > Hi Chas,
> >
> > On 2022/9/15 0:59, Chas Williams wrote:
> >> On 9/13/22 20:46, fengchengwen wrote:
> >>>
> >>> The main problem is hard to design a tx_prepare for bonding device:
> >>> 1. as Chas Williams said, there maybe twice hash calc to get target slave
> >>>      devices.
> >>> 2. also more important, if the slave devices have changes(e.g. slave 
> >>> device
> >>>      link down or remove), and if the changes happens between 
> >>> bond-tx-prepare and
> >>>      bond-tx-burst, the output slave will changes, and this may lead to 
> >>> checksum
> >>>      failed. (Note: a bond device with slave devices may from different 
> >>> vendors,
> >>>      and slave devices may have different requirements, e.g. slave-A 
> >>> support calc
> >>>      IPv4 pseudo-head automatic (no need driver pre-calc), but slave-B 
> >>> need driver
> >>>      pre-calc).
> >>>
> >>> Current design cover the above two scenarios by using in-place 
> >>> tx-prepare. and
> >>> in addition, bond devices are not transparent to applications, I think 
> >>> it's a
> >>> practical method to provide tx-prepare support in this way.
> >>>
> >>
> >>
> >> I don't think you need to export an enable/disable routine for the use of
> >> rte_eth_tx_prepare. It's safe to just call that routine, even if it isn't
> >> implemented. You are just trading one branch in DPDK librte_eth_dev for a
> >> branch in drivers/net/bonding.
> >
> > Our first patch was just like yours (just add tx-prepare default), but 
> > community
> > is concerned about impacting performance.
> >
> > As a trade-off, I think we can add the enable/disable API.
> 
> IMHO, that's a bad idea. If the rte_eth_dev_tx_prepare API affects
> performance adversly, that is not a bonding problem. All applications
> should be calling rte_eth_dev_tx_prepare. There's no defined API
> to determine if rte_eth_dev_tx_prepare should be called. Therefore,
> applications should always call rte_eth_dev_tx_prepare. Regardless,
> as I previously mentioned, you are just trading the location of
> the branch, especially in the bonding case.
> 
> If rte_eth_dev_tx_prepare is causing a performance drop, then that API
> should be improved or rewritten. There are PMD that require you to use
> that API. Locally, we had maintained a patch to eliminate the use of
> rte_eth_dev_tx_prepare. However, that has been getting harder and harder
> to maintain. The performance lost by just calling rte_eth_dev_tx_prepare
> was marginal.
> 
> >
> >>
> >> I think you missed fixing tx_machine in 802.3ad support. We have been using
> >> the following patch locally which I never got around to submitting.
> >
> > You are right, I will send V3 fix it.
> >
> >>
> >>
> >>  From a458654d68ff5144266807ef136ac3dd2adfcd98 Mon Sep 17 00:00:00 2001
> >> From: "Charles (Chas) Williams" <chwil...@ciena.com>
> >> Date: Tue, 3 May 2022 16:52:37 -0400
> >> Subject: [PATCH] net/bonding: call rte_eth_tx_prepare before 
> >> rte_eth_tx_burst
> >>
> >> Some PMDs might require a call to rte_eth_tx_prepare before sending the
> >> packets for transmission. Typically, the prepare step handles the VLAN
> >> headers, but it may need to do other things.
> >>
> >> Signed-off-by: Chas Williams <chwil...@ciena.com>
> >
> > ...
> >
> >>                * ring if transmission fails so the packet isn't lost.
> >> @@ -1322,8 +1350,12 @@ bond_ethdev_tx_burst_broadcast(void *queue, struct 
> >> rte_mbuf **bufs,
> >>
> >>       /* Transmit burst on each active slave */
> >>       for (i = 0; i < num_of_slaves; i++) {
> >> -        slave_tx_total[i] = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
> >> +        uint16_t nb_prep;
> >> +
> >> +        nb_prep = rte_eth_tx_prepare(slaves[i], bd_tx_q->queue_id,
> >>                       bufs, nb_pkts);
> >> +        slave_tx_total[i] = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
> >> +                    bufs, nb_prep);
> >
> > The tx-prepare may edit packet data, and the broadcast mode will send a 
> > packet to all slaves,
> > the packet data is sent and edited at the same time. Is this likely to 
> > cause problems ?
> 
> This routine is already broken. You can't just increment the refcount
> and send the packet into a PMD's transmit routine. Nothing guarantees
> that a transmit routine will not modify the packet. Many PMDs perform an
> rte_vlan_insert. 

Hmm interesting.... 
My uderstanding was quite opposite - tx_burst() can't modify packet data and 
metadata
(except when refcnt==1 and tx_burst() going to free the mbuf and put it back to 
the mempool).
While tx_prepare() can - actually as I remember that was one of the reasons why 
a separate routine
was introduced.

> You should at least perform a clone of the packet so
> that the mbuf headers aren't mangled by each PMD. Just to be safe you
> should perform a partial deep copy of the packet headers in case some
> PMD does an rte_vlan_insert and the other PMDs in the bonding group do
> not need an rte_vlan_insert.
> 
> So doing a blind rte_eth_dev_tx_preprare isn't making anything much
> worse.
> 
> >
> >>
> >>           if (unlikely(slave_tx_total[i] < nb_pkts))
> >>               tx_failed_flag = 1;

Reply via email to