Removing the 'poll-sleep-usec 1000' from startup.conf might help. Adding a 1 ms delay before processing packets seems pretty likely to increase RX misses when there are bursts of traffic.
-Matt On Wed, Apr 29, 2026 at 11:28 AM steven luong via lists.fd.io <sluong= [email protected]> wrote: > My guess is that the LACP periodic hello packets are being dropped when > there is a big burst of traffic which caused the neighbor to time out. On > VPP side, you can see how much traffic were dropped with these 2 CLIs. > vppctl show hardware verbose > vppctl show interface > > However, the switch side can still drop traffic. You have to see how much > traffic the switch side drops also. > If keeping the bond interface stable is more important than fast detection > on NIC failure, maybe you could try configuring both VPP and the switch to > use long timeout. > > Get Outlook for Mac <https://aka.ms/GetOutlookForMac> > *From: *[email protected] <[email protected]> on behalf of Peter > Potvin via lists.fd.io <[email protected]> > *Date: *Wednesday, April 29, 2026 at 8:54 AM > *To: *[email protected] <[email protected]> > *Subject: *[vpp-dev] LACP Instability with mlx5 and VPP > v26.06-rc0~519-gf206434c7 > > We recently started running VPP on Debian 12 with dual-port Mellanox > CX414A-GCAT NICs, and encountered an issue where the LACP bond flaps > randomly while the physical links remain up. I noticed a previous thread > where this was reported ( > https://lists.fd.io/g/vpp-dev/topic/mellanox_connectx5_lacp_not/113443571) > and attempted the same fix: changing the LACP timer on the switch-side > (Arista DCS-7050QX-32S-R) to fast using "lacp timer fast" on the physical > ports. This temporarily stabilized the connection until LACP resumed > flapping randomly again, leaving me at a standstill trying to determine > what is occurring. > > I've attached my startup config and a redacted version of my vpp bootstrap > config to provide better context on the configuration used, as well as the > output from "show err" and switch logs for the port channel. For reference > here, devices "0000:87:00.0" and "0000:87:00.1" are the NIC ports with the > LACP members. > > VPP version is v26.06-rc0~519-gf206434c7, on Debian 12 Bookworm with Linux > kernel version 6.1.0-44-amd64. Hardware is a dual Intel Xeon E5-2637v4 CPU > with 64GB DDR4 ECC memory. Hyperthreading is disabled, and the VPP workers > are pinned to the CPU cores connected to the network cards to avoid > motherboard PCIe bridge bottlenecks. > > VPP does not seem to crash; it stays running without any errors, and I > don't see any errors in the logs. > > Any insight is appreciated. > > Kind regards, > Peter Potvin > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#27001): https://lists.fd.io/g/vpp-dev/message/27001 Mute This Topic: https://lists.fd.io/mt/119066888/21656 Group Owner: [email protected] Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/14379924/21656/631435203/xyzzy [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
