>>>> We are using OVS bond with the configuration as shown below. We found that 
>>>> when the switch is configured in MLAG mode:
>>>>
>>>>  1. We tested sending 1000 UDP packets per second, with a total of 500,000 
>>>> packets. The statistics show that there is occasionally a chance of losing 
>>>> a few packets (background traffic is around 100,000 pps).
>>>>  2. We also tested the same scenario in a non-MLAG setup with the same 
>>>> testing steps, and there was basically no packet loss (background traffic 
>>>> is around 100,000 pps).
>>>>
>>>>  
>>>> |ovs-vsctl --may-exist add-bond br-tun dpdk_tun_port tun_port_p0 
>>>> tun_port_p1 \|
>>>> |bond_mode=balance-tcp lacp=active 
>>>> other-config:bond-rebalance-interval=||1000| 
>>>> |other_config:lb-output-action=||true| |\|
>>>> |-- set Interface tun_port_p0 type=dpdk \|
>>>> |options:dpdk-devargs=||0000||:ca:||00.0| |mtu_request=||1600| |\|
>>>> |                    ||options:n_rxq_desc=||2048| 
>>>> |options:n_txq_desc=||2048| |\|
>>>> |                    ||options:n_rxq=||4| |\|
>>>> |-- set Interface tun_port_p1 type=dpdk \|
>>>> |options:dpdk-devargs=||0000||:ca:||00.1| |mtu_request=||1600| |\|
>>>> |                    ||options:n_rxq_desc=||2048| 
>>>> |options:n_txq_desc=||2048| |\|
>>>> |                    ||options:n_rxq=||4|
>>>>  
>>>> For the phenomenon described above, in the MLAG scenario, after adjusting 
>>>> |bond-rebalance-interval=0|, we found that during
>>>> the test of sending 1000 UDP packets per second with a total of 500,000 
>>>> packets, there was no packet loss. It is unclear whether
>>>> this is an issue related to OVS bond's support for MLAG, which might cause 
>>>> packet loss under certain conditions. Therefore,
>>>>  I found some relevant descriptions online that may explain this behavior.
>>>>  
>>>> https://forum.proxmox.com/threads/solution-multiple-ovs-sdn-and-lacp-heartaches.120634/
>>>>  
>>>> <https://forum.proxmox.com/threads/solution-multiple-ovs-sdn-and-lacp-heartaches.120634/>
>>>> The description is as follows:
>>>>   “Since my original post, I've spent some more time on this, and 
>>>> discovered that OVS is totally unsuitable for more enterprise-y setups,
>>>> and MLAG in particular. None of the MLAG implementations I've tried this 
>>>> with expect distinct flows to flap between legs, and this does cause minor 
>>>> issues.“
>>>>  
>>>> In addition, the |bond-rebalance-interval=0| configuration can lead to 
>>>> traffic imbalance between interfaces, which has a negative impact. Is 
>>>> there a better solution to this issue?
>>>
>>  
>>>If the switches do not support traffic moving between legs of the MLAG,
>>>then disabling of the rebalancing sounds like the only option.
>>  
>> I have confirmed that the switch supports MLAG, but I occasionally encounter 
>> packet loss, which is very frustrating and unacceptable.
>>  
>>>Note that the traffic will still be balanced based on the packet hash.
>>>So, as long as you have many light traffic streams the traffic should
>>>be well balanced.  But if your traffic pattern involves having a couple
>>>heavy streams along many light ones, then it may not be fully balanced
>>>indeed.
>>  
>> I captured packets and confirmed that OVS is dropping packets. Observing the 
>> coverage statistics, 
>> I noticed an increase in |datapath_drop_invalid_port|. However, after 
>> setting |bond-rebalance-interval=0|, the counter no longer increases.
>
> 
> Hmm, this is strange.  Do you see ports go down intermittently?
> Is there something interesting in the ovs-vswitchd.log ?

The logs print "shift" related information very frequently.
Similar to the following, other valid logs have not been observed.
2024-10-13T02:17:07.179Z|00424|bond|INFO|bond dpdk_tun_port: shift 1kB of load 
(with hash 51) from tun_port_p0 to tun_port_p1 (now carrying 149kB and 138kB 
load, respectively)
2024-10-13T02:17:07.179Z|00425|bond|INFO|bond dpdk_tun_port: shift 1kB of load 
(with hash 193) from tun_port_p0 to tun_port_p1 (now carrying 147kB and 139kB 
load, respectively)

> In general, all the operations with the bond table in the datapath
> are RCU-protected, so rebalancing should not cause any, even short,
> time frames where the port id is not valid.  So, the increase of
> the counter is very strange.
> 
> Also, what is your OVS version?
 ovs version 2.17.5 lts.
>  
>>Is there a linux bond mode that works fine with this MLAG?
>>
> Jun Wang
>  



Jun Wang
 
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to