On 9/21/22 22:12, fengchengwen wrote:
On 2022/9/20 7:02, Chas Williams wrote:
On 9/19/22 10:07, Konstantin Ananyev wrote:
On 9/16/22 22:35, fengchengwen wrote:
Hi Chas,
On 2022/9/15 0:59, Chas Williams wrote:
On 9/13/22 20:46, fengchengwen wrote:
The main problem is hard to design a tx_prepare for bonding device:
1. as Chas Williams said, there maybe twice hash calc to get target slave
devices.
2. also more important, if the slave devices have changes(e.g. slave device
link down or remove), and if the changes happens between bond-tx-prepare
and
bond-tx-burst, the output slave will changes, and this may lead to
checksum
failed. (Note: a bond device with slave devices may from different
vendors,
and slave devices may have different requirements, e.g. slave-A support
calc
IPv4 pseudo-head automatic (no need driver pre-calc), but slave-B need
driver
pre-calc).
Current design cover the above two scenarios by using in-place tx-prepare. and
in addition, bond devices are not transparent to applications, I think it's a
practical method to provide tx-prepare support in this way.
I don't think you need to export an enable/disable routine for the use of
rte_eth_tx_prepare. It's safe to just call that routine, even if it isn't
implemented. You are just trading one branch in DPDK librte_eth_dev for a
branch in drivers/net/bonding.
Our first patch was just like yours (just add tx-prepare default), but community
is concerned about impacting performance.
As a trade-off, I think we can add the enable/disable API.
IMHO, that's a bad idea. If the rte_eth_dev_tx_prepare API affects
performance adversly, that is not a bonding problem. All applications
should be calling rte_eth_dev_tx_prepare. There's no defined API
to determine if rte_eth_dev_tx_prepare should be called. Therefore,
applications should always call rte_eth_dev_tx_prepare. Regardless,
as I previously mentioned, you are just trading the location of
the branch, especially in the bonding case.
If rte_eth_dev_tx_prepare is causing a performance drop, then that API
should be improved or rewritten. There are PMD that require you to use
that API. Locally, we had maintained a patch to eliminate the use of
rte_eth_dev_tx_prepare. However, that has been getting harder and harder
to maintain. The performance lost by just calling rte_eth_dev_tx_prepare
was marginal.
I think you missed fixing tx_machine in 802.3ad support. We have been using
the following patch locally which I never got around to submitting.
You are right, I will send V3 fix it.
From a458654d68ff5144266807ef136ac3dd2adfcd98 Mon Sep 17 00:00:00 2001
From: "Charles (Chas) Williams" <chwil...@ciena.com>
Date: Tue, 3 May 2022 16:52:37 -0400
Subject: [PATCH] net/bonding: call rte_eth_tx_prepare before rte_eth_tx_burst
Some PMDs might require a call to rte_eth_tx_prepare before sending the
packets for transmission. Typically, the prepare step handles the VLAN
headers, but it may need to do other things.
Signed-off-by: Chas Williams <chwil...@ciena.com>
...
* ring if transmission fails so the packet isn't lost.
@@ -1322,8 +1350,12 @@ bond_ethdev_tx_burst_broadcast(void *queue, struct
rte_mbuf **bufs,
/* Transmit burst on each active slave */
for (i = 0; i < num_of_slaves; i++) {
- slave_tx_total[i] = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
+ uint16_t nb_prep;
+
+ nb_prep = rte_eth_tx_prepare(slaves[i], bd_tx_q->queue_id,
bufs, nb_pkts);
+ slave_tx_total[i] = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
+ bufs, nb_prep);
The tx-prepare may edit packet data, and the broadcast mode will send a packet
to all slaves,
the packet data is sent and edited at the same time. Is this likely to cause
problems ?
This routine is already broken. You can't just increment the refcount
and send the packet into a PMD's transmit routine. Nothing guarantees
that a transmit routine will not modify the packet. Many PMDs perform an
rte_vlan_insert.
Hmm interesting....
My uderstanding was quite opposite - tx_burst() can't modify packet data and
metadata
(except when refcnt==1 and tx_burst() going to free the mbuf and put it back to
the mempool).
While tx_prepare() can - actually as I remember that was one of the reasons why
a separate routine
was introduced.
Is that documented anywhere? It's been my experience that the device PMD
can do practically anything and you need to protect yourself. Currently,
the af_packet, dpaa2, and vhost driver call rte_vlan_insert. Before 2019,
the virtio driver also used to call rte_vlan_insert during its transmit
path. Of course, rte_vlan_insert modifies the packet data and the mbuf
header. Regardless, it looks like rte_eth_dev_tx_prepare should always be
called. Handling that correctly in broadcast mode probably means always
make a deep copy of the packet, or check to see if all the members are
the same PMD type. If so, you can just call prepare once. You could track
the mismatched nature during additional/removal of the members. Or just
assume people aren't going to mismatch bonding members.
the rte_eth_tx_prepare has notes:
* Since this function can modify packet data, provided mbufs must be safely
* writable (e.g. modified data cannot be in shared segment).
but rte_eth_tx_burst have not such requirement.
except above examples of rte_vlan_insert, there are also some PMDs modify
mbuf's header
and data, e.g. hns3/ark/bnxt will invoke rte_pktmbuf_append in case of the
pkt-len too small.
I prefer the rte_eth_tx_burst add such restricts: the PMD should not modify the
mbuf except refcnt==1.
so that application could rely on there explicit definition to do business.
As for this bonding scenario, we have three alternatives:
1) as Chas provided patch, always do tx-prepare before tx-burst. it was simple,
but have: it
may modify the mbuf but application could not detect (unless especial documents)
2) my patch, application could invoke the prepare_enable/disable to control
whether to do prepare.
3) implement bonding PMD's tx-prepare, it do tx-preare for each slave, but
existing some problem:
if the slave device changes (e.g. add new device), some packet errors may occur
because we have not
do prepare for the new add device.
note1: the above 1/2 both violate rte_eth_tx_burst's requirement, so we should
especial document.
note2: we can do some optimization for 3, e.g. if the same driver name is
detected on multiple slave
devices, here only need to perform tx-prepare once. but the problem
above descripe still exist
because of dynamic slave devices at runtime.
hope for more discuess. @Ferruh @Chas @Humin @Konstantin
I don't think adding additional API due to concerns about performance is
the solution to the performance problem. If the tx_prepare API is slow,
that's what needs to be fixed. I imagine that more drivers will be using
the tx_prepare API over time not less. It would be a good idea to get
used to calling it.
As for broadcast mode, let's just call tx_prepare once for any given
packet. For now, assume that no one would attempt to bond different
PMDs together. In my experience, that would be unusual. I have never
seen anyone do that in a production context. If a bug report comes in
about this failing for someone, we can fix it then.
You should at least perform a clone of the packet so
that the mbuf headers aren't mangled by each PMD. Just to be safe you
should perform a partial deep copy of the packet headers in case some
PMD does an rte_vlan_insert and the other PMDs in the bonding group do
not need an rte_vlan_insert.
So doing a blind rte_eth_dev_tx_preprare isn't making anything much
worse.
if (unlikely(slave_tx_total[i] < nb_pkts))
tx_failed_flag = 1;
.