Removing the 'poll-sleep-usec 1000' from startup.conf might help. Adding a
1 ms delay before processing packets seems pretty likely to increase RX
misses when there are bursts of traffic.

-Matt


On Wed, Apr 29, 2026 at 11:28 AM steven luong via lists.fd.io <sluong=
[email protected]> wrote:

> My guess is that the LACP periodic hello packets are being dropped when
> there is a big burst of traffic which caused the neighbor to time out.  On
> VPP side, you can see how much traffic were dropped with these 2 CLIs.
> vppctl show hardware verbose
> vppctl show interface
>
> However, the switch side can still drop traffic. You have to see how much
> traffic the switch side drops also.
> If keeping the bond interface stable is more important than fast detection
> on NIC failure, maybe you could try configuring both VPP and the switch to
> use long timeout.
>
> Get Outlook for Mac <https://aka.ms/GetOutlookForMac>
> *From: *[email protected] <[email protected]> on behalf of Peter
> Potvin via lists.fd.io <[email protected]>
> *Date: *Wednesday, April 29, 2026 at 8:54 AM
> *To: *[email protected] <[email protected]>
> *Subject: *[vpp-dev] LACP Instability with mlx5 and VPP
> v26.06-rc0~519-gf206434c7
>
> We recently started running VPP on Debian 12 with dual-port Mellanox
> CX414A-GCAT NICs, and encountered an issue where the LACP bond flaps
> randomly while the physical links remain up. I noticed a previous thread
> where this was reported (
> https://lists.fd.io/g/vpp-dev/topic/mellanox_connectx5_lacp_not/113443571)
> and attempted the same fix: changing the LACP timer on the switch-side
> (Arista DCS-7050QX-32S-R) to fast using "lacp timer fast" on the physical
> ports. This temporarily stabilized the connection until LACP resumed
> flapping randomly again, leaving me at a standstill trying to determine
> what is occurring.
>
> I've attached my startup config and a redacted version of my vpp bootstrap
> config to provide better context on the configuration used, as well as the
> output from "show err" and switch logs for the port channel. For reference
> here, devices "0000:87:00.0" and "0000:87:00.1" are the NIC ports with the
> LACP members.
>
> VPP version is v26.06-rc0~519-gf206434c7, on Debian 12 Bookworm with Linux
> kernel version 6.1.0-44-amd64. Hardware is a dual Intel Xeon E5-2637v4 CPU
> with 64GB DDR4 ECC memory. Hyperthreading is disabled, and the VPP workers
> are pinned to the CPU cores connected to the network cards to avoid
> motherboard PCIe bridge bottlenecks.
>
> VPP does not seem to crash; it stays running without any errors, and I
> don't see any errors in the logs.
>
> Any insight is appreciated.
>
> Kind regards,
> Peter Potvin
>
> 
>
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#27001): https://lists.fd.io/g/vpp-dev/message/27001
Mute This Topic: https://lists.fd.io/mt/119066888/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/14379924/21656/631435203/xyzzy 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to