My guess is that the LACP periodic hello packets are being dropped when there is a big burst of traffic which caused the neighbor to time out. On VPP side, you can see how much traffic were dropped with these 2 CLIs. vppctl show hardware verbose vppctl show interface
However, the switch side can still drop traffic. You have to see how much traffic the switch side drops also. If keeping the bond interface stable is more important than fast detection on NIC failure, maybe you could try configuring both VPP and the switch to use long timeout. Get Outlook for Mac <https://aka.ms/GetOutlookForMac> From: [email protected] <[email protected]> on behalf of Peter Potvin via lists.fd.io <[email protected]> Date: Wednesday, April 29, 2026 at 8:54 AM To: [email protected] <[email protected]> Subject: [vpp-dev] LACP Instability with mlx5 and VPP v26.06-rc0~519-gf206434c7 We recently started running VPP on Debian 12 with dual-port Mellanox CX414A-GCAT NICs, and encountered an issue where the LACP bond flaps randomly while the physical links remain up. I noticed a previous thread where this was reported (https://lists.fd.io/g/vpp-dev/topic/mellanox_connectx5_lacp_not/113443571) and attempted the same fix: changing the LACP timer on the switch-side (Arista DCS-7050QX-32S-R) to fast using "lacp timer fast" on the physical ports. This temporarily stabilized the connection until LACP resumed flapping randomly again, leaving me at a standstill trying to determine what is occurring. I've attached my startup config and a redacted version of my vpp bootstrap config to provide better context on the configuration used, as well as the output from "show err" and switch logs for the port channel. For reference here, devices "0000:87:00.0" and "0000:87:00.1" are the NIC ports with the LACP members. VPP version is v26.06-rc0~519-gf206434c7, on Debian 12 Bookworm with Linux kernel version 6.1.0-44-amd64. Hardware is a dual Intel Xeon E5-2637v4 CPU with 64GB DDR4 ECC memory. Hyperthreading is disabled, and the VPP workers are pinned to the CPU cores connected to the network cards to avoid motherboard PCIe bridge bottlenecks. VPP does not seem to crash; it stays running without any errors, and I don't see any errors in the logs. Any insight is appreciated. Kind regards, Peter Potvin
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#27000): https://lists.fd.io/g/vpp-dev/message/27000 Mute This Topic: https://lists.fd.io/mt/119066888/21656 Group Owner: [email protected] Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/14379924/21656/631435203/xyzzy [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
