bonding: support Tx prepare for bonding

Ananyev, Konstantin Mon, 14 Jun 2021 04:37:01 -0700

> On 2021/6/9 18:25, Ananyev, Konstantin wrote:
> >>> On 4/23/21 12:46 PM, Chengchang Tang wrote:
> >>>> To use the HW offloads capability (e.g. checksum and TSO) in the Tx
> >>>> direction, the upper-layer users need to call rte_eth_dev_prepare to do
> >>>> some adjustment to the packets before sending them (e.g. processing
> >>>> pseudo headers when Tx checksum offoad enabled). But, the tx_prepare
> >>>> callback of the bond driver is not implemented. Therefore, related
> >>>> offloads can not be used unless the upper layer users process the packet
> >>>> properly in their own application. But it is bad for the
> >>>> transplantability.
> >>>>
> >>>> However, it is difficult to design the tx_prepare callback for bonding
> >>>> driver. Because when a bonded device sends packets, the bonded device
> >>>> allocates the packets to different slave devices based on the real-time
> >>>> link status and bonding mode. That is, it is very difficult for the
> >>>> bonding device to determine which slave device's prepare function should
> >>>> be invoked. In addition, if the link status changes after the packets are
> >>>> prepared, the packets may fail to be sent because packets allocation may
> >>>> change.
> >>>>
> >>>> So, in this patch, the tx_prepare callback of bonding driver is not
> >>>> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for
> >>>> all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all
> >>>> tx_offloads can be processed correctly for all NIC devices in these 
> >>>> modes.
> >>>> If tx_prepare is not required in some cases, then slave PMDs tx_prepare
> >>>> pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP.
> >>>> In these cases, the impact on performance will be very limited. It is
> >>>> the responsibility of the slave PMDs to decide when the real tx_prepare
> >>>> needs to be used. The information from dev_config/queue_setup is
> >>>> sufficient for them to make these decisions.
> >>>>
> >>>> Note:
> >>>> The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is
> >>>> because in broadcast mode, a packet needs to be sent by all slave ports.
> >>>> Different PMDs process the packets differently in tx_prepare. As a 
> >>>> result,
> >>>> the sent packet may be incorrect.
> >>>>
> >>>> Signed-off-by: Chengchang Tang <tangchengch...@huawei.com>
> >>>> ---
> >>>>  drivers/net/bonding/rte_eth_bond.h     |  1 -
> >>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++----
> >>>>  2 files changed, 24 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/drivers/net/bonding/rte_eth_bond.h 
> >>>> b/drivers/net/bonding/rte_eth_bond.h
> >>>> index 874aa91..1e6cc6d 100644
> >>>> --- a/drivers/net/bonding/rte_eth_bond.h
> >>>> +++ b/drivers/net/bonding/rte_eth_bond.h
> >>>> @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t 
> >>>> bonded_port_id,
> >>>>  int
> >>>>  rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id);
> >>>>
> >>>> -
> >>>>  #ifdef __cplusplus
> >>>>  }
> >>>>  #endif
> >>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
> >>>> b/drivers/net/bonding/rte_eth_bond_pmd.c
> >>>> index 2e9cea5..84af348 100644
> >>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> >>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> >>>> @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, 
> >>>> struct rte_mbuf **bufs,
> >>>>          /* Send packet burst on each slave device */
> >>>>          for (i = 0; i < num_of_slaves; i++) {
> >>>>                  if (slave_nb_pkts[i] > 0) {
> >>>> +                        int nb_prep_pkts;
> >>>> +
> >>>> +                        nb_prep_pkts = rte_eth_tx_prepare(slaves[i],
> >>>> +                                        bd_tx_q->queue_id, 
> >>>> slave_bufs[i],
> >>>> +                                        slave_nb_pkts[i]);
> >>>> +
> >>>
> >>> Shouldn't it be called iff queue Tx offloads are not zero?
> >>> It will allow to decrease performance degradation if no
> >>> Tx offloads are enabled. Same in all cases below.
> >>
> >> Regarding this point, it has been discussed in the previous RFC:
> >> https://inbox.dpdk.org/dev/47f907cf-3933-1de9-9c45-6734b912e...@huawei.com/
> >>
> >> According to the TX_OFFLOAD status of the current device, PMDs can 
> >> determine
> >> whether tx_prepare is currently needed. If it is not needed, set 
> >> pkt_tx_prepare
> >> to NULL, so that the actual tx_prepare processing will be skipped directly 
> >> in
> >> rte_eth_tx_prepare().
> >>
> >>>
> >>>>                          num_tx_slave = rte_eth_tx_burst(slaves[i], 
> >>>> bd_tx_q->queue_id,
> >>>> -                                        slave_bufs[i], 
> >>>> slave_nb_pkts[i]);
> >>>> +                                        slave_bufs[i], nb_prep_pkts);
> >>>
> >>> In fact it is a problem here and really big problems.
> >>> Tx prepare may fail and return less packets. Tx prepare
> >>> of some packet may always fail. If application tries to
> >>> send packets in a loop until success, it will be a
> >>> forever loop here. Since application calls Tx burst,
> >>> it is 100% legal behaviour of the function to return 0
> >>> if Tx ring is full. It is not an error indication.
> >>> However, in the case of Tx prepare it is an error
> >>> indication.
> >
> > Yes, that sounds like a problem and existing apps might be affected.
> >
> >>>
> >>> Should we change Tx burst description and enforce callers
> >>> to check for rte_errno? It sounds like a major change...
> >>>
> >
> > Agree, rte_errno for tx_burst() is probably a simplest and sanest way,
> > but yes, it is a change in behaviour and apps will need to be updated.
> > Another option for bond PMD - just silently free mbufs for which prepare()
> > fails (and probably update some stats counter).
> > Again it is a change in behaviour, but now just for one PMD, with tx 
> > offloads enabled.
> > Also as, I can see some tx_burst() function for that PMD already free 
> > packets silently:
> > bond_ethdev_tx_burst_alb(), bond_ethdev_tx_burst_broadcast().
> >
> > Actually another question - why the patch adds tx_prepare() only to some
> > TX modes but not all?
> > Is that itended?
> >
> 
> Yes. Currently, I have no ideal to perform tx_prepare() in broadcast mode 
> with limited
> impact on performance. In broadcast mode, same packets will be send in 
> several devices.
> In this process, we only update the ref_cnt of mbufs, but no copy of packets.
> As we know,
> tx_prepare() may change the data, so it may cause some problem if we perform 
> tx_prepare()
> several times on the same packet.


You mean tx_prepare() for second dev can void changes made by tx_prepare() for 
first dev?
I suppose in theory it is possible, even if it is probably not the case right 
now in practise
(at least I am not aware about such cases).
Actually that's an interesting topic - same can happen even with user 
implementing multicast
on his own (see examples/ipv4_multicast/). 
I think these new limitations have to be documented clearly (at least).
Also probably  we need extra changes fo bond device 
dev_confgiure()/dev_get_info():
to check currently selected mode and based on that allow/reject tx offloads.
The question arises (again) how to figure out for which tx offloads 
dev->tx_prepare()
modifies the packet, for which not? 
Any thoughts here? 

> 
> >>
> >> I agree that if the failure is caused by Tx ring full, it is a legal 
> >> behaviour.
> >> But what about the failure caused by other reasons? At present, it is 
> >> possible
> >> for some PMDs to fail during tx_burst due to other reasons. In this case,
> >> repeated tries to send will also fail.
> >>
> >> I'm not sure if all PMDs need to support the behavior of sending packets 
> >> in a
> >> loop until it succeeds. If not, I think the current problem can be 
> >> reminded to
> >> the user by adding a description to the bonding. If it is necessary, I 
> >> think the
> >> description of tx_burst should also add related instructions, so that the 
> >> developers
> >> of PMDs can better understand how tx_burst should be designed, such as 
> >> putting all
> >> hardware-related constraint checks into tx_prepare. And another 
> >> prerequisite for
> >> the above behavior is that the packets must be prepared (i.e. checked by
> >> rte_eth_tx_prepare()). Otherwise, it may also fail to send. This means 
> >> that we have
> >> to use rte_eth_tx_prepare() in more scenarios.
> >>
> >> What's Ferruh's opinion on this?
> >>
> >>> [snip]
> >>>
> >>> .
> >>>
> >

Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding

Reply via email to