>>>> We are using OVS bond with the configuration as shown below. We found that >>>> when the switch is configured in MLAG mode: >>>> >>>> 1. We tested sending 1000 UDP packets per second, with a total of 500,000 >>>> packets. The statistics show that there is occasionally a chance of losing >>>> a few packets (background traffic is around 100,000 pps). >>>> 2. We also tested the same scenario in a non-MLAG setup with the same >>>> testing steps, and there was basically no packet loss (background traffic >>>> is around 100,000 pps). >>>> >>>> >>>> |ovs-vsctl --may-exist add-bond br-tun dpdk_tun_port tun_port_p0 >>>> tun_port_p1 \| >>>> |bond_mode=balance-tcp lacp=active >>>> other-config:bond-rebalance-interval=||1000| >>>> |other_config:lb-output-action=||true| |\| >>>> |-- set Interface tun_port_p0 type=dpdk \| >>>> |options:dpdk-devargs=||0000||:ca:||00.0| |mtu_request=||1600| |\| >>>> | ||options:n_rxq_desc=||2048| >>>> |options:n_txq_desc=||2048| |\| >>>> | ||options:n_rxq=||4| |\| >>>> |-- set Interface tun_port_p1 type=dpdk \| >>>> |options:dpdk-devargs=||0000||:ca:||00.1| |mtu_request=||1600| |\| >>>> | ||options:n_rxq_desc=||2048| >>>> |options:n_txq_desc=||2048| |\| >>>> | ||options:n_rxq=||4| >>>> >>>> For the phenomenon described above, in the MLAG scenario, after adjusting >>>> |bond-rebalance-interval=0|, we found that during >>>> the test of sending 1000 UDP packets per second with a total of 500,000 >>>> packets, there was no packet loss. It is unclear whether >>>> this is an issue related to OVS bond's support for MLAG, which might cause >>>> packet loss under certain conditions. Therefore, >>>> I found some relevant descriptions online that may explain this behavior. >>>> >>>> https://forum.proxmox.com/threads/solution-multiple-ovs-sdn-and-lacp-heartaches.120634/ >>>> >>>> <https://forum.proxmox.com/threads/solution-multiple-ovs-sdn-and-lacp-heartaches.120634/> >>>> The description is as follows: >>>> “Since my original post, I've spent some more time on this, and >>>> discovered that OVS is totally unsuitable for more enterprise-y setups, >>>> and MLAG in particular. None of the MLAG implementations I've tried this >>>> with expect distinct flows to flap between legs, and this does cause minor >>>> issues.“ >>>> >>>> In addition, the |bond-rebalance-interval=0| configuration can lead to >>>> traffic imbalance between interfaces, which has a negative impact. Is >>>> there a better solution to this issue? >>> >> >>>If the switches do not support traffic moving between legs of the MLAG, >>>then disabling of the rebalancing sounds like the only option. >> >> I have confirmed that the switch supports MLAG, but I occasionally encounter >> packet loss, which is very frustrating and unacceptable. >> >>>Note that the traffic will still be balanced based on the packet hash. >>>So, as long as you have many light traffic streams the traffic should >>>be well balanced. But if your traffic pattern involves having a couple >>>heavy streams along many light ones, then it may not be fully balanced >>>indeed. >> >> I captured packets and confirmed that OVS is dropping packets. Observing the >> coverage statistics, >> I noticed an increase in |datapath_drop_invalid_port|. However, after >> setting |bond-rebalance-interval=0|, the counter no longer increases. > > > Hmm, this is strange. Do you see ports go down intermittently? > Is there something interesting in the ovs-vswitchd.log ?
The logs print "shift" related information very frequently. Similar to the following, other valid logs have not been observed. 2024-10-13T02:17:07.179Z|00424|bond|INFO|bond dpdk_tun_port: shift 1kB of load (with hash 51) from tun_port_p0 to tun_port_p1 (now carrying 149kB and 138kB load, respectively) 2024-10-13T02:17:07.179Z|00425|bond|INFO|bond dpdk_tun_port: shift 1kB of load (with hash 193) from tun_port_p0 to tun_port_p1 (now carrying 147kB and 139kB load, respectively) > In general, all the operations with the bond table in the datapath > are RCU-protected, so rebalancing should not cause any, even short, > time frames where the port id is not valid. So, the increase of > the counter is very strange. > > Also, what is your OVS version? ovs version 2.17.5 lts. > >>Is there a linux bond mode that works fine with this MLAG? >> > Jun Wang > Jun Wang
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss