On 2015-05-06, Marko Cupać <marko.cu...@mimar.rs> wrote:
> On Wed, 29 Apr 2015 11:02:09 +0200
> Marko Cupać <marko.cu...@mimar.rs> wrote:
>
>> On Tue, 28 Apr 2015 15:11:21 +0200
>> Claudio Jeker <cje...@diehard.n-r-g.com> wrote:
>> 
>> > The "fatal in RDE: peer_up: bad state" bug is fixed in 5.7 IIRC. Not
>> > sure if it was backported to 5.6. As a workaround you can disable
>> > the graceful restart capability to not trigger that code path.
>> 
>> I was intending to upgrade on Friday anyway so no problem. In the
>> meantime I updated to -stable, it's too early to say if it fixed it.
>
> I am on 5.7 release + errata patches now, and bgpd crashed again:
>
> May  6 10:06:07 bgp1 bgpd[11681]: neighbor 82.117.192.121 (sbb): sync error
> May  6 10:06:07 bgp1 bgpd[11681]: neighbor 82.117.192.121 (sbb): sending 
> notification: Header error, synchronization error
> May  6 10:06:07 bgp1 bgpd[11681]: neighbor 82.117.192.121 (sbb): graceful 
> restart of IPv4 unicast, keeping routes

Can you get a packet capture of TCP port 179 during a failure? 

tcpdump -i <interface> -w bgp.`date +%Y%m%d-%H%M`.pcap -s1500 tcp and port 179

It might be best to run it from a script run from cron which pkills
tcpdump and rotates the file to avoid having huge files.

You can review the files with 'tcpdump -nvvr [filename]', but the raw pcap
files (and time of the failure as shown in logs) are more useful for anyone
else looking into this.

> I guess bug is not solved in 5.7 release then. Maybe 5.7 stable?

No changes to bgpd in 5.7-stable. (There were some changes in -current
but they won't affect this).

> This issue is having really bad impact on my network. Both ISP links
> are up and running, but - as bgpd dies - my firewall has no routes
> which effectively stops the traffic flow with the Internet.
>
> I have contacted ISPs and ask them to check if they are sending us bad
> bgp packets. Regardless of that, I think bgpd shouldn't just shutdown
> itself no matter what payload it gets?

There are two parts to this.

One is it seems there is a bad BGP message hitting the parser in bgpd.
Most likely it comes from the peer (though I haven't looked at the code
deeply enough to rule out other possibilities). Every BGP message is
supposed to start with 16 0xff bytes, this "sync error" log message is
only triggered when a message is seen which does not have this.
When this happens it is correct that the *peer* is taken down as
there is some major problem.

A packet trace with the right parts in it should confirm whether the
problem is with a message from the peer or internal to bgpd.

The other part is that it's triggering bgpd exiting. That's not good.

> Any help with this would be highly appreciated.

Any idea what software (version number may be relevant too) your
neighbours are using? Or at least what hardware vendor shows up in
their MAC address?

pkg_add maclookup
arp -an | grep <their_ip_address> | maclookup

Reply via email to