Hi,

I'm trying to track down some issue resulting in BFD session timeouts in our 
deployment.

What I'm seeing is that (seemingly) randomly one chassis stops sending BFD 
packets to some of its neighbors (seemingly one at a time, and it seems one 
chassis is more prone to that behavior currently). After timeout is reached, 
neighbor signals that the session is down, and they re-establish it promptly. 
I've captured BFD packets on both chassis and it seems that one chassis stops 
sending its BFD packets, or at least they are not showing up on the wire. At 
the same time I can see incoming BFD packets from the neighbor so seemingly 
it's not an underlying networking issue that is causing it.

There is nothing BFD related in the logs until session is torn down by the 
neighbor, the only correlated logs I can see right now is constant messages 
like that in syslog:

```
ovs-system: deferred action limit reached, drop recirc action
```

Those seem to be caused by a constant barrage of ARP requests (500-600/s) 
coming from the external network router for IP addresses that are not currently 
in use. That seems to be putting some extra load on ovs-vswitchd process, but 
seemingly nowhere enough to stop it from processing other packets (ovs-vswitchd 
logs don't report increased CPU usage).

openvswitch version: 2.11.0
ovn version: 20.09.90 (a build from 20.09 branch from 2020.12.07)

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to