On Mon, Apr 13, 2020 at 03:30:12PM +0100, Stuart Henderson wrote:
> On 2020/04/13 15:21, Richard Chivers wrote:
> > Hi,
> > 
> > Thanks everyone, we will update to start with and see how it goes from 
> > there. If the issues
> > continue we will dump the ospf traffic.
> > 
> > When we were looking at these issues I noticed when running ospfctl sh nei 
> > that we had two DR.
> 
> That will definitely happen with pre-6.3 versions after some flaps.

Seeing two DRs in the output of "ospfctl sh nei" is not an issue. There
must be a DR for each broadcast or NBMA network.

But you should not see multiple DR neighbors on the same link.

> 
> > I thought there could/should only be a single one.
> > 
> > Any ideas on this, are there snearios where this is valid? We only run a 
> > single area.
> > 
> > Thanks
> > 
> > Richard
> > 
> > 
> > 
> > On Mon, 13 Apr 2020, 14:39 Stuart Henderson, <s...@spacehopper.org> wrote:
> > 
> >     On 2020-04-13, Claudio Jeker <cje...@diehard.n-r-g.com> wrote:
> >     > On Mon, Apr 13, 2020 at 02:08:31PM +0200, Remi Locherer wrote:
> >     >> On Mon, Apr 13, 2020 at 12:05:10PM +0100, Richard Chivers wrote:
> >     >> > On Mon, 13 Apr 2020, 10:18 Remi Locherer, <remi.loche...@relo.ch> 
> > wrote:
> >     >> > >
> >     >> > > On Mon, Apr 13, 2020 at 08:38:31AM +0100, Richard Chivers wrote:
> >     >> > > > We have been having a strange issue, whereby OSPF stops 
> > updating
> >     >> > > properly.
> >     >> > > >
> >     >> > > > We can see an entry for an ip route in the database but it is 
> > not in the
> >     >> > > > kernel routing table, and when it is the DR, other routers 
> > then do not
> >     >> > > have
> >     >> > > > the route at all.
> >     >> > > >
> >     >> > > > We are seeing this across multiple boxes. We have 10+ ospf 
> > speakers, and
> >     >> > > > seem to see the issue at different times.
> >     >> > > >
> >     >> > > > The problem starts with:
> >     >> > > >
> >     >> > > > ospfd[6960]: recv_db_description: neighbor ID x.x.x.x: seq num 
> > mismatch,
> >     >> > > > bad flags
> >     >> > >
> >     >> > > The neighbor sent a db desc with the master flag set differently 
> > than what
> >     >> > > this ospfd instance recorded before for that particular neighbor.
> >     >> > >
> >     >> > > See 2nd last item on page 100 of RFC 2328:
> >     >> > > https://tools.ietf.org/html/rfc2328#page-100
> >     >> >
> >     >> >
> >     >> > Thanks, should the routers just recover then from this scenario 
> > even if it
> >     >> > was happening due to lost packets, CPU pause etc.
> >     >>
> >     >> I think so. But it may take quite a while. It might also be an bug 
> > in ospfd
> >     >> or in another implementation.
> > 
> >     On my 6.6/current boxes it seems to recover fairly quickly from this (30
> >     seconds or so). I've definitely seen it take a long time in the past 
> > though.
> > 
> >     > Since this issues happen with 5.8 and 6.4 ospfd I would suggest to 
> > update
> >     > to at least 6.6 (especially the 5.8). IIRC there was some issue with 
> > ospfd
> >     > neighbor selection that caused troubles when sessions flapped. This 
> > was
> >     > fixed some time ago but I doubt 5.8 has that fix in.
> > 
> >     That one was fixed in 6.3.
> > 
> >     If you also run bgpd then be aware there are crashes with the version in
> >     6.6 release - fixed in syspatches (and of course in snapshots), but one
> >     of the crashes is at startup with some configurations and it's hard to
> >     run syspatch if you have no routing ;) so either be ready to cope with
> >     that in case you run into it (e.g. pre-download the syspatch directory
> >     and make sure you have console access), or consider skipping 6.6 (go
> >     straight to a -current snapshot).
> > 
> > 
> > 
> 

Reply via email to