My guess is that the LACP periodic hello packets are being dropped when there 
is a big burst of traffic which caused the neighbor to time out.  On VPP side, 
you can see how much traffic were dropped with these 2 CLIs.
vppctl show hardware verbose
vppctl show interface

However, the switch side can still drop traffic. You have to see how much 
traffic the switch side drops also.
If keeping the bond interface stable is more important than fast detection on 
NIC failure, maybe you could try configuring both VPP and the switch to use 
long timeout.

Get Outlook for Mac <https://aka.ms/GetOutlookForMac>
From: [email protected] <[email protected]> on behalf of Peter Potvin via 
lists.fd.io <[email protected]>
Date: Wednesday, April 29, 2026 at 8:54 AM
To: [email protected] <[email protected]>
Subject: [vpp-dev] LACP Instability with mlx5 and VPP v26.06-rc0~519-gf206434c7

We recently started running VPP on Debian 12 with dual-port Mellanox 
CX414A-GCAT NICs, and encountered an issue where the LACP bond flaps randomly 
while the physical links remain up. I noticed a previous thread where this was 
reported 
(https://lists.fd.io/g/vpp-dev/topic/mellanox_connectx5_lacp_not/113443571) and 
attempted the same fix: changing the LACP timer on the switch-side (Arista 
DCS-7050QX-32S-R) to fast using "lacp timer fast" on the physical ports. This 
temporarily stabilized the connection until LACP resumed flapping randomly 
again, leaving me at a standstill trying to determine what is occurring.

I've attached my startup config and a redacted version of my vpp bootstrap 
config to provide better context on the configuration used, as well as the 
output from "show err" and switch logs for the port channel. For reference 
here, devices "0000:87:00.0" and "0000:87:00.1" are the NIC ports with the LACP 
members.

VPP version is v26.06-rc0~519-gf206434c7, on Debian 12 Bookworm with Linux 
kernel version 6.1.0-44-amd64. Hardware is a dual Intel Xeon E5-2637v4 CPU with 
64GB DDR4 ECC memory. Hyperthreading is disabled, and the VPP workers are 
pinned to the CPU cores connected to the network cards to avoid motherboard 
PCIe bridge bottlenecks.

VPP does not seem to crash; it stays running without any errors, and I don't 
see any errors in the logs.

Any insight is appreciated.

Kind regards,
Peter Potvin
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#27000): https://lists.fd.io/g/vpp-dev/message/27000
Mute This Topic: https://lists.fd.io/mt/119066888/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/14379924/21656/631435203/xyzzy 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to