On 2020/04/13 15:21, Richard Chivers wrote: > Hi, > > Thanks everyone, we will update to start with and see how it goes from there. > If the issues > continue we will dump the ospf traffic. > > When we were looking at these issues I noticed when running ospfctl sh nei > that we had two DR.
That will definitely happen with pre-6.3 versions after some flaps. > I thought there could/should only be a single one. > > Any ideas on this, are there snearios where this is valid? We only run a > single area. > > Thanks > > Richard > > > > On Mon, 13 Apr 2020, 14:39 Stuart Henderson, <s...@spacehopper.org> wrote: > > On 2020-04-13, Claudio Jeker <cje...@diehard.n-r-g.com> wrote: > > On Mon, Apr 13, 2020 at 02:08:31PM +0200, Remi Locherer wrote: > >> On Mon, Apr 13, 2020 at 12:05:10PM +0100, Richard Chivers wrote: > >> > On Mon, 13 Apr 2020, 10:18 Remi Locherer, <remi.loche...@relo.ch> > wrote: > >> > > > >> > > On Mon, Apr 13, 2020 at 08:38:31AM +0100, Richard Chivers wrote: > >> > > > We have been having a strange issue, whereby OSPF stops updating > >> > > properly. > >> > > > > >> > > > We can see an entry for an ip route in the database but it is > not in the > >> > > > kernel routing table, and when it is the DR, other routers then > do not > >> > > have > >> > > > the route at all. > >> > > > > >> > > > We are seeing this across multiple boxes. We have 10+ ospf > speakers, and > >> > > > seem to see the issue at different times. > >> > > > > >> > > > The problem starts with: > >> > > > > >> > > > ospfd[6960]: recv_db_description: neighbor ID x.x.x.x: seq num > mismatch, > >> > > > bad flags > >> > > > >> > > The neighbor sent a db desc with the master flag set differently > than what > >> > > this ospfd instance recorded before for that particular neighbor. > >> > > > >> > > See 2nd last item on page 100 of RFC 2328: > >> > > https://tools.ietf.org/html/rfc2328#page-100 > >> > > >> > > >> > Thanks, should the routers just recover then from this scenario even > if it > >> > was happening due to lost packets, CPU pause etc. > >> > >> I think so. But it may take quite a while. It might also be an bug in > ospfd > >> or in another implementation. > > On my 6.6/current boxes it seems to recover fairly quickly from this (30 > seconds or so). I've definitely seen it take a long time in the past > though. > > > Since this issues happen with 5.8 and 6.4 ospfd I would suggest to > update > > to at least 6.6 (especially the 5.8). IIRC there was some issue with > ospfd > > neighbor selection that caused troubles when sessions flapped. This was > > fixed some time ago but I doubt 5.8 has that fix in. > > That one was fixed in 6.3. > > If you also run bgpd then be aware there are crashes with the version in > 6.6 release - fixed in syspatches (and of course in snapshots), but one > of the crashes is at startup with some configurations and it's hard to > run syspatch if you have no routing ;) so either be ready to cope with > that in case you run into it (e.g. pre-download the syspatch directory > and make sure you have console access), or consider skipping 6.6 (go > straight to a -current snapshot). > > >