On Tue, Apr 28, 2015 at 11:28:31AM +0200, Marko Cupa?? wrote: > Hi, > > I have a pair of OpenBSD 5.6 firewalls running releases happily for > years (I think since 5.1). They are in CARP failover mode, running bgp > sessions with upstrem providers and filtering traffic. > > Few days ago I had Internet outage (first in years), which appear to > happen as a result of bgpd crash. I could ping ISP's interface, but > then i noticed i have no routes at all (except connected ones) in > routing table. Next, I discovered there is no bgpd running process. > Restarting bgpd gave me routes and Internet connectivity back. > > Here's excerpt from messages log: > > Apr 17 18:29:18 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): sync error > Apr 17 18:29:18 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): sending > notification: Header error, synchronization error > Apr 17 18:29:18 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): graceful > restart of IPv4 unicast, keeping routes > Apr 17 18:29:18 bgp2 bgpd[24107]: neighbor 82.117.192.121 (sbb): bad nlri > prefix > Apr 17 18:29:19 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): sending > notification: error in UPDATE message, network unacceptable > Apr 17 18:29:51 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): graceful > restart of IPv4 unicast, not restarted, flushing > Apr 17 18:29:52 bgp2 bgpd[24107]: fatal in RDE: peer_up: bad state > Apr 17 18:29:52 bgp2 bgpd[32268]: dispatch_imsg in main: pipe closed > Apr 17 18:29:52 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): sending > notification: Cease, administratively down > Apr 17 18:29:52 bgp2 bgpd[9759]: neighbor 178.253.194.253 (orion): sending > notification: Cease, administratively down > > > Also from daemon log at the same time: > > Apr 17 18:29:18 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): sync error > Apr 17 18:29:18 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): sending > notification: Header error, synchronization error > Apr 17 18:29:18 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): graceful > restart of IPv4 unicast, keeping routes > Apr 17 18:29:18 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): state change > Established -> Idle, reason: Fatal error > Apr 17 18:29:18 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): state change > Idle -> Connect, reason: Start > Apr 17 18:29:18 bgp2 bgpd[32268]: incremented the demote state of group 'carp' > Apr 17 18:29:18 bgp2 bgpd[24107]: neighbor 82.117.192.121 (sbb): bad nlri > prefix > Apr 17 18:29:18 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): state change > Connect -> OpenSent, reason: Connection opened > Apr 17 18:29:18 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): state change > OpenSent -> Active, reason: Connection closed > Apr 17 18:29:19 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): sending > notification: error in UPDATE message, network unacceptable > Apr 17 18:29:19 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): state change > Active -> Idle, reason: Fatal error > Apr 17 18:29:49 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): state change > Idle -> Connect, reason: Start > Apr 17 18:29:49 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): state change > Connect -> OpenSent, reason: Connection opened > Apr 17 18:29:51 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): graceful > restart of IPv4 unicast, not restarted, flushing > Apr 17 18:29:51 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): state change > OpenSent -> OpenConfirm, reason: OPEN message received > Apr 17 18:29:51 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): state change > OpenConfirm -> Established, reason: KEEPALIVE message received > Apr 17 18:29:52 bgp2 bgpd[24107]: fatal in RDE: peer_up: bad state > Apr 17 18:29:52 bgp2 bgpd[32268]: dispatch_imsg in main: pipe closed > Apr 17 18:29:52 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): sending > notification: Cease, administratively down > Apr 17 18:29:52 bgp2 bgpd[32268]: decremented the demote state of group 'carp' > Apr 17 18:29:52 bgp2 bgpd[9759]: neighbor 82.117.192.121 (sbb): state change > Established -> Idle, reason: Stop > Apr 17 18:29:52 bgp2 bgpd[9759]: neighbor 178.253.194.253 (orion): sending > notification: Cease, administratively down > Apr 17 18:29:52 bgp2 bgpd[9759]: neighbor 178.253.194.253 (orion): state > change Established -> Idle, reason: Stop > Apr 17 18:29:52 bgp2 bgpd[9759]: session engine exiting > Apr 17 18:29:54 bgp2 bgpd[32268]: kernel routing table 0 (Loc-RIB) decoupled > Apr 17 18:29:55 bgp2 bgpd[32268]: Terminating > > > I would be grateful if someone explained me me what happened here, and > also what to do in order to avoid it in the future. >
The "fatal in RDE: peer_up: bad state" bug is fixed in 5.7 IIRC. Not sure if it was backported to 5.6. As a workaround you can disable the graceful restart capability to not trigger that code path. Hope that helps. -- :wq Claudio