Hi, Thanks everyone, we will update to start with and see how it goes from there. If the issues continue we will dump the ospf traffic.
When we were looking at these issues I noticed when running ospfctl sh nei that we had two DR. I thought there could/should only be a single one. Any ideas on this, are there snearios where this is valid? We only run a single area. Thanks Richard On Mon, 13 Apr 2020, 14:39 Stuart Henderson, <s...@spacehopper.org> wrote: > On 2020-04-13, Claudio Jeker <cje...@diehard.n-r-g.com> wrote: > > On Mon, Apr 13, 2020 at 02:08:31PM +0200, Remi Locherer wrote: > >> On Mon, Apr 13, 2020 at 12:05:10PM +0100, Richard Chivers wrote: > >> > On Mon, 13 Apr 2020, 10:18 Remi Locherer, <remi.loche...@relo.ch> > wrote: > >> > > > >> > > On Mon, Apr 13, 2020 at 08:38:31AM +0100, Richard Chivers wrote: > >> > > > We have been having a strange issue, whereby OSPF stops updating > >> > > properly. > >> > > > > >> > > > We can see an entry for an ip route in the database but it is not > in the > >> > > > kernel routing table, and when it is the DR, other routers then > do not > >> > > have > >> > > > the route at all. > >> > > > > >> > > > We are seeing this across multiple boxes. We have 10+ ospf > speakers, and > >> > > > seem to see the issue at different times. > >> > > > > >> > > > The problem starts with: > >> > > > > >> > > > ospfd[6960]: recv_db_description: neighbor ID x.x.x.x: seq num > mismatch, > >> > > > bad flags > >> > > > >> > > The neighbor sent a db desc with the master flag set differently > than what > >> > > this ospfd instance recorded before for that particular neighbor. > >> > > > >> > > See 2nd last item on page 100 of RFC 2328: > >> > > https://tools.ietf.org/html/rfc2328#page-100 > >> > > >> > > >> > Thanks, should the routers just recover then from this scenario even > if it > >> > was happening due to lost packets, CPU pause etc. > >> > >> I think so. But it may take quite a while. It might also be an bug in > ospfd > >> or in another implementation. > > On my 6.6/current boxes it seems to recover fairly quickly from this (30 > seconds or so). I've definitely seen it take a long time in the past > though. > > > Since this issues happen with 5.8 and 6.4 ospfd I would suggest to update > > to at least 6.6 (especially the 5.8). IIRC there was some issue with > ospfd > > neighbor selection that caused troubles when sessions flapped. This was > > fixed some time ago but I doubt 5.8 has that fix in. > > That one was fixed in 6.3. > > If you also run bgpd then be aware there are crashes with the version in > 6.6 release - fixed in syspatches (and of course in snapshots), but one > of the crashes is at startup with some configurations and it's hard to > run syspatch if you have no routing ;) so either be ready to cope with > that in case you run into it (e.g. pre-download the syspatch directory > and make sure you have console access), or consider skipping 6.6 (go > straight to a -current snapshot). > > >