Hello, We've recently begun testing using OpenBSD 4.4 with OpenBGP in our datacenter. Our initial tests have uncovered an odd issue we hope you all can help us with. I've included our configs and relevant information below.
The summary of our issue is this: 1.) Upon starting bgpd the session between the two routers goes to established and updates are passed. 2.) Keepalives aren't passed beyond the first exchange. 3.) After some time, the session goes to IDLE on both routers. 4.) The session tears down if we either issue a bgpctl command (like show summary or show neighbors) or wait 240 seconds after the initial connect. 5.) The routers then reestablish connections but they drop again. 6.) The exact same setup works fine with OpenBGP 4.3. Here's what we've found. If we modify session.c at line 405 (timeout = 240; /* loop every 240s at least */) to some number lower than our holdtime, it works. Adding debugging code to the code after that line shows us that the code doesn't get processed again after the intial setup unless the timeout value is reached or some bgpctl statement is excecuted. We've replicated this error in two different test environments. The error causes sessions to be torn down anytime a 4.4 bgpd is used. (ie 4.4 -> 4.4 and 4.4 -> 4.3). Please let me know if you need any additional information from me. Thanks so much, Marc Runkel Technical Operations Manger Untangle, Inc. The two machines in question are dcrouter1 and bgptest2: dcrouter1:/etc/bgpd.conf #macros # XO Peer XOpeer="65.46.252.33" # global configuration AS 21634 router-id 65.46.252.34 log updates network 64.2.3.0/24 holdtime min 3 holdtime 90 # neighbors and peers neighbor $XOpeer { remote-as 2828 descr XO Upstream local-address 65.46.252.34 multihop 2 } # filter out prefixes longer than 24 or shorter than 8 bits deny from any allow from any inet prefixlen 8 - 24 # do not accept a default route deny from any prefix 0.0.0.0/0 # We're in test mode, so we gotta let the test networks in (192.168.0.0/16). # filter bogus networks deny from any prefix 10.0.0.0/8 prefixlen >= 8 deny from any prefix 172.16.0.0/12 prefixlen >= 12 #deny from any prefix 192.168.0.0/16 prefixlen >= 16 deny from any prefix 169.254.0.0/16 prefixlen >= 16 deny from any prefix 192.0.2.0/24 prefixlen >= 24 deny from any prefix 224.0.0.0/4 prefixlen >= 4 deny from any prefix 240.0.0.0/4 prefixlen >= 4 -------------- END -------------- dcrouter1:/etc/hostname.em0 inet 65.46.252.34 255.255.255.252 65.46.252.35 description "XO WAN" -------------- END -------------- dcrouter1:/var/log/daemon.log (bgpd only) Jan 20 11:19:51 dcrouter1 bgpd[24217]: startup Jan 20 11:19:51 dcrouter1 bgpd[14770]: route decision engine ready Jan 20 11:19:52 dcrouter1 bgpd[5962]: listening on 0.0.0.0 Jan 20 11:19:52 dcrouter1 bgpd[5962]: listening on :: Jan 20 11:19:52 dcrouter1 bgpd[5962]: session engine ready Jan 20 11:19:52 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream): state change None -> Idle, reason: None Jan 20 11:19:52 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream): state change Idle -> Connect, reason: Start Jan 20 11:19:52 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream): socket error: Connection refused Jan 20 11:19:52 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream): state change Connect -> Active, reason: Connection open failed Jan 20 11:19:56 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream): state change Active -> OpenSent, reason: Connection opened Jan 20 11:19:56 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream): state change OpenSent -> OpenConfirm, reason: OPEN message received Jan 20 11:19:56 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream): state change OpenConfirm -> Established, reason: KEEPALIVE message received Jan 20 11:19:56 dcrouter1 bgpd[14770]: neighbor 65.46.252.33 (XO Upstream) AS2828: update 192.168.42.0/24 via 65.46.252.33 Jan 20 11:19:56 dcrouter1 bgpd[24217]: nexthop 65.46.252.33 now valid: directly connected Jan 20 11:20:44 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream): received notification: HoldTimer expired, unknown subcode 0 Jan 20 11:20:44 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream): state change Established -> Idle, reason: NOTIFICATION received -------------- END -------------- dcrouter1:tcpdump -vvns1500 -i em0 port 179 tcpdump: listening on em0, link-type EN10MB 11:19:52.537633 65.46.252.34.48310 > 65.46.252.33.179: S [tcp sum ok] 1642888815:1642888815(0) win 16384 <mss 1460,nop,nop,sackOK,nop,wscale 0,nop,nop,timestamp 2322120143 0> (DF) [tos 0xc0] (ttl 2, id 23223, len 64) 11:19:52.537747 65.46.252.33.179 > 65.46.252.34.48310: R [tcp sum ok] 0:0(0) ack 1642888816 win 0 (DF) (ttl 64, id 40395, len 40)11:19:56.759172 65.46.252.33.1985 > 65.46.252.34.179: S [tcp sum ok] 2516427034:2516427034(0) win 16384 <mss 1460,nop,nop,sackOK,nop,wscale 0,nop,nop,timestamp 1931362699 0> (DF) [tos 0xc0] (ttl 2, id 61323, len 64) 11:19:56.759201 65.46.252.34.179 > 65.46.252.33.1985: S [tcp sum ok] 2812695705:2812695705(0) ack 2516427035 win 16384 <mss 1460,nop,wscale 0,nop,nop,timestamp 443534217 1931362699> (DF) (ttl 255, id 42087, len 60) 11:19:56.759298 65.46.252.33.1985 > 65.46.252.34.179: . [tcp sum ok] 1:1(0) ack 1 win 16384 <nop,nop,timestamp 1931362699 443534217> (DF) [tos 0xc0] (ttl 2, id 39909, len 52) 11:19:56.759553 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok] 1:42(41) ack 1 win 16384 <nop,nop,timestamp 1931362699 443534217>: BGP (OPEN: Version 4, AS #2828, Holdtime 15, ID 65.46.252.33, Option length 12 (option Capabilities Advertisement, len=6) (option Capabilities Advertisement, len=2)) (DF) [tos 0xc0] (ttl 2, id 23479, len 93) 11:19:56.759661 65.46.252.34.179 > 65.46.252.33.1985: P [tcp sum ok] 1:42(41) ack 42 win 17335 <nop,nop,timestamp 443534217 1931362699>: BGP (OPEN: Version 4, AS #21634, Holdtime 90, ID 65.46.252.34, Option length 12 (option Capabilities Advertisement, len=6) (option Capabilities Advertisement, len=2)) (DF) [tos 0xc0] (ttl 2, id 9726, len 93) 11:19:56.759739 65.46.252.34.179 > 65.46.252.33.1985: P [tcp sum ok] 42:61(19) ack 42 win 17376 <nop,nop,timestamp 443534217 1931362699>: BGP (KEEPALIVE) (DF) [tos 0xc0] (ttl 2, id 45368, len 71)11:19:56.759922 65.46.252.33.1985 > 65.46.252.34.179: . [tcp sum ok] 42:42(0) ack 61 win 16365 <nop,nop,timestamp 1931362699 443534217> (DF) [tos 0xc0] (ttl 2, id 446, len 52) 11:19:56.760045 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok] 42:61(19) ack 61 win 16365 <nop,nop,timestamp 1931362699 443534217>: BGP (KEEPALIVE) (DF) [tos 0xc0] (ttl 2, id 60740, len 71) 11:19:56.760426 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok] 61:106(45) ack 61 win 16384 <nop,nop,timestamp 1931362699 443534217>: BGP (UPDATE: (Path attributes: (ORIGIN[T] IGP) (AS_PATH[T] 2828) (NEXT_HOP[T] 65.46.252.33)) (NLRI: 192.168.42.0/24)) (DF) [tos 0xc0] (ttl 2, id 26775, len 97)11:19:56.760453 65.46.252.34.179 > 65.46.252.33.1985: . [tcp sum ok] 61:61(0) ack 106 win 17331 <nop,nop,timestamp 443534217 1931362699> (DF) [tos 0xc0] (ttl 2, id 18859, len 52)11:19:56.760751 65.46.252.34.179 > 65.46.252.33.1985: P [tcp sum ok] 61:106(45) ack 106 win 17376 <nop,nop,timestamp 443534217 1931362699>: BGP (UPDATE: (Path attributes: (ORIGIN[T] IGP) (AS_PATH[T] 21634) (NEXT_HOP[T] 65.46.252.34)) (NLRI: 64.2.3.0/24)) (DF) [tos 0xc0] (ttl 2, id 31180, len 97)11:19:56.952678 65.46.252.33.1985 > 65.46.252.34.179: . [tcp sum ok] 106:106(0) ack 106 win 16384 <nop,nop,timestamp 1931362700 443534217> (DF) [tos 0xc0] (ttl 2, id 39193, len 52)11:20:44.422472 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok] 106:125(19) ack 106 win 16384 <nop,nop,timestamp 1931362795 443534217>: BGP (KEEPALIVE) (DF) [tos 0xc0] (ttl 2, id 24587, len 71)11:20:44.422482 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok] 125:146(21) ack 106 win 16384 <nop,nop,timestamp 1931362795 443534217>: BGP (NOTIFICATION: error Hold Timer Expired, subcode #0) (DF) [tos 0xc0] (ttl 2, id 48190, len 73) 11:20:44.422489 65.46.252.33.1985 > 65.46.252.34.179: F [tcp sum ok] 146:146(0) ack 106 win 16384 <nop,nop,timestamp 1931362795 443534217> (DF) [tos 0xc0] (ttl 2, id 25026, len 52) 11:20:44.422526 65.46.252.34.179 > 65.46.252.33.1985: . [tcp sum ok] 106:106(0) ack 146 win 17336 <nop,nop,timestamp 443534312 1931362795> (DF) [tos 0xc0] (ttl 2, id 15154, len 52) 11:20:44.422532 65.46.252.34.179 > 65.46.252.33.1985: . [tcp sum ok] 106:106(0) ack 147 win 17336 <nop,nop,timestamp 443534312 1931362795> (DF) [tos 0xc0] (ttl 2, id 3469, len 52) 11:20:44.422821 65.46.252.34.179 > 65.46.252.33.1985: F [tcp sum ok] 106:106(0) ack 147 win 17376 <nop,nop,timestamp 443534312 1931362795> (DF) [tos 0xc0] (ttl 2, id 2668, len 52) 11:20:44.422972 65.46.252.33.1985 > 65.46.252.34.179: . [tcp sum ok] 147:147(0) ack 107 win 16384 <nop,nop,timestamp 1931362795 443534312> (DF) [tos 0xc0] (ttl 2, id 4160, len 52) -------------- END -------------- bgptest2:/etc/bgpd.conf #macros # Untangle peer2="65.46.252.34" # global configuration AS 2828 router-id 65.46.252.33 log updates network 192.168.42.0/24 holdtime min 3 holdtime 15 # neighbors and peers neighbor $peer2 { remote-as 21634 descr Untangle Datacenter local-address 65.46.252.33 multihop 2 } # filter out prefixes longer than 24 or shorter than 8 bits deny from any allow from any inet prefixlen 8 - 24 # do not accept a default route deny from any prefix 0.0.0.0/0 # We're in test mode, so we gotta let the test networks in (192.168.0.0/16). # filter bogus networks deny from any prefix 10.0.0.0/8 prefixlen >= 8 deny from any prefix 172.16.0.0/12 prefixlen >= 12 #deny from any prefix 192.168.0.0/16 prefixlen >= 16 deny from any prefix 169.254.0.0/16 prefixlen >= 16 deny from any prefix 192.0.2.0/24 prefixlen >= 24 deny from any prefix 224.0.0.0/4 prefixlen >= 4 deny from any prefix 240.0.0.0/4 prefixlen >= 4 -------------- END -------------- bgptest2:/etc/hostname.fxp0 dhcp NONE NONE NONE inet alias 65.46.252.33 255.255.255.252 65.46.252.35 description "XO Datacenter"dhcp NONE NONE NONE -------------- END -------------- bgptest2:ifconfig -A lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33204 groups: lo inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 fxp0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:11:11:cf:09:f4 description: XO Datacenter groups: egress media: Ethernet autoselect (100baseTX full-duplex) status: active inet6 fe80::211:11ff:fecf:9f4%fxp0 prefixlen 64 scopeid 0x1 inet 10.0.0.144 netmask 0xff000000 broadcast 10.255.255.255 inet 65.46.252.33 netmask 0xfffffffc broadcast 65.46.252.35 enc0: flags=0<> mtu 1536 -------------- END -------------- bgptest2:/var/log/daemon.log (bgpd only) Jan 20 11:19:56 bgptest2 bgpd[14506]: startup Jan 20 11:19:56 bgptest2 bgpd[17995]: route decision engine ready Jan 20 11:19:56 bgptest2 bgpd[29940]: listening on 0.0.0.0 Jan 20 11:19:56 bgptest2 bgpd[29940]: listening on :: Jan 20 11:19:56 bgptest2 bgpd[29940]: session engine ready Jan 20 11:19:56 bgptest2 bgpd[29940]: neighbor 65.46.252.34 (Untangle Datacenter): state change None -> Idle, reason: None Jan 20 11:19:56 bgptest2 bgpd[29940]: neighbor 65.46.252.34 (Untangle Datacenter): state change Idle -> Connect, reason: Start Jan 20 11:19:56 bgptest2 bgpd[29940]: neighbor 65.46.252.34 (Untangle Datacenter): state change Connect -> OpenSent, reason: Connection opened Jan 20 11:19:56 bgptest2 bgpd[29940]: neighbor 65.46.252.34 (Untangle Datacenter): state change OpenSent -> OpenConfirm, reason: OPEN message received Jan 20 11:19:56 bgptest2 bgpd[29940]: neighbor 65.46.252.34 (Untangle Datacenter): state change OpenConfirm -> Established, reason: KEEPALIVE message received Jan 20 11:19:56 bgptest2 bgpd[17995]: neighbor 65.46.252.34 (Untangle Datacenter) AS21634: update 64.2.3.0/24 via 65.46.252.34 Jan 20 11:19:56 bgptest2 bgpd[14506]: nexthop 65.46.252.34 now valid: directly connected Jan 20 11:20:44 bgptest2 bgpd[29940]: neighbor 65.46.252.34 (Untangle Datacenter): state change Established -> Idle, reason: HoldTimer expired -------------- END -------------- bgptest2:tcpdump -vvns1500 -i fxp0 port 179 tcpdump: listening on fxp0, link-type EN10MB 11:19:52.527328 65.46.252.34.48310 > 65.46.252.33.179: S [tcp sum ok] 1642888815:1642888815(0) win 16384 <mss 1460,nop,nop,sackOK,nop,wscale 0,nop,nop,timestamp 2322120143 0> (DF) [tos 0xc0] (ttl 2, id 23223, len 64) 11:19:52.527348 65.46.252.33.179 > 65.46.252.34.48310: R [tcp sum ok] 0:0(0) ack 1642888816 win 0 (DF) (ttl 64, id 40395, len 40) 11:19:56.748739 65.46.252.33.1985 > 65.46.252.34.179: S [tcp sum ok] 2516427034:2516427034(0) win 16384 <mss 1460,nop,nop,sackOK,nop,wscale 0,nop,nop,timestamp 1931362699 0> (DF) [tos 0xc0] (ttl 2, id 61323, len 64) 11:19:56.748896 65.46.252.34.179 > 65.46.252.33.1985: S [tcp sum ok] 2812695705:2812695705(0) ack 2516427035 win 16384 <mss 1460,nop,wscale 0,nop,nop,timestamp 443534217 1931362699> (DF) (ttl 255, id 42087, len 60) 11:19:56.748925 65.46.252.33.1985 > 65.46.252.34.179: . [tcp sum ok] 1:1(0) ack 1 win 16384 <nop,nop,timestamp 1931362699 443534217> (DF) [tos 0xc0] (ttl 2, id 39909, len 52) 11:19:56.749147 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok] 1:42(41) ack 1 win 16384 <nop,nop,timestamp 1931362699 443534217>: BGP (OPEN: Version 4, AS #2828, Holdtime 15, ID 65.46.252.33, Option length 12 (option Capabilities Advertisement, len=6) (option Capabilities Advertisement, len=2)) (DF) [tos 0xc0] (ttl 2, id 23479, len 93) 11:19:56.749358 65.46.252.34.179 > 65.46.252.33.1985: P [tcp sum ok] 1:42(41) ack 42 win 17335 <nop,nop,timestamp 443534217 1931362699>: BGP (OPEN: Version 4, AS #21634, Holdtime 90, ID 65.46.252.34, Option length 12 (option Capabilities Advertisement, len=6) (option Capabilities Advertisement, len=2)) (DF) [tos 0xc0] (ttl 2, id 9726, len 93) 11:19:56.749433 65.46.252.34.179 > 65.46.252.33.1985: P [tcp sum ok] 42:61(19) ack 42 win 17376 <nop,nop,timestamp 443534217 1931362699>: BGP (KEEPALIVE) (DF) [tos 0xc0] (ttl 2, id 45368, len 71) 11:19:56.749455 65.46.252.33.1985 > 65.46.252.34.179: . [tcp sum ok] 42:42(0) ack 61 win 16365 <nop,nop,timestamp 1931362699 443534217> (DF) [tos 0xc0] (ttl 2, id 446, len 52) 11:19:56.749617 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok] 42:61(19) ack 61 win 16365 <nop,nop,timestamp 1931362699 443534217>: BGP (KEEPALIVE) (DF) [tos 0xc0] (ttl 2, id 60740, len 71) 11:19:56.749956 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok] 61:106(45) ack 61 win 16384 <nop,nop,timestamp 1931362699 443534217>: BGP (UPDATE: (Path attributes: (ORIGIN[T] IGP) (AS_PATH[T] 2828) (NEXT_HOP[T] 65.46.252.33)) (NLRI: 192.168.42.0/24)) (DF) [tos 0xc0] (ttl 2, id 26775, len 97) 11:19:56.750146 65.46.252.34.179 > 65.46.252.33.1985: . [tcp sum ok] 61:61(0) ack 106 win 17331 <nop,nop,timestamp 443534217 1931362699> (DF) [tos 0xc0] (ttl 2, id 18859, len 52) 11:19:56.750447 65.46.252.34.179 > 65.46.252.33.1985: P [tcp sum ok] 61:106(45) ack 106 win 17376 <nop,nop,timestamp 443534217 1931362699>: BGP (UPDATE: (Path attributes: (ORIGIN[T] IGP) (AS_PATH[T] 21634) (NEXT_HOP[T] 65.46.252.34)) (NLRI: 64.2.3.0/24)) (DF) [tos 0xc0] (ttl 2, id 31180, len 97) 11:19:56.942182 65.46.252.33.1985 > 65.46.252.34.179: . [tcp sum ok] 106:106(0) ack 106 win 16384 <nop,nop,timestamp 1931362700 443534217> (DF) [tos 0xc0] (ttl 2, id 39193, len 52) 11:20:44.411990 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok] 106:125(19) ack 106 win 16384 <nop,nop,timestamp 1931362795 443534217>: BGP (KEEPALIVE) (DF) [tos 0xc0] (ttl 2, id 24587, len 71) 11:20:44.412013 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok] 125:146(21) ack 106 win 16384 <nop,nop,timestamp 1931362795 443534217>: BGP (NOTIFICATION: error Hold Timer Expired, subcode #0) (DF) [tos 0xc0] (ttl 2, id 48190, len 73) 11:20:44.412052 65.46.252.33.1985 > 65.46.252.34.179: F [tcp sum ok] 146:146(0) ack 106 win 16384 <nop,nop,timestamp 1931362795 443534217> (DF) [tos 0xc0] (ttl 2, id 25026, len 52) 11:20:44.412211 65.46.252.34.179 > 65.46.252.33.1985: . [tcp sum ok] 106:106(0) ack 146 win 17336 <nop,nop,timestamp 443534312 1931362795> (DF) [tos 0xc0] (ttl 2, id 15154, len 52) 11:20:44.412222 65.46.252.34.179 > 65.46.252.33.1985: . [tcp sum ok] 106:106(0) ack 147 win 17336 <nop,nop,timestamp 443534312 1931362795> (DF) [tos 0xc0] (ttl 2, id 3469, len 52) 11:20:44.412500 65.46.252.34.179 > 65.46.252.33.1985: F [tcp sum ok] 106:106(0) ack 147 win 17376 <nop,nop,timestamp 443534312 1931362795> (DF) [tos 0xc0] (ttl 2, id 2668, len 52) 11:20:44.412527 65.46.252.33.1985 > 65.46.252.34.179: . [tcp sum ok] 147:147(0) ack 107 win 16384 <nop,nop,timestamp 1931362795 443534312> (DF) [tos 0xc0] (ttl 2, id 4160, len 52) -------------- END -------------- bgptest2:bgpctl show neighbors @ 11:20:44 bgptest2:/etc#bgpctl show nei BGP neighbor is 65.46.252.34, remote AS 21634 Description: Untangle Datacenter BGP version 4, remote router-id 65.46.252.34 BGP state = Idle, down for 00:00:00 Last read 00:00:48, holdtime 240s, keepalive interval 80s Message statistics: Sent Received Opens 1 1 Notifications 1 0 Updates 1 1 Keepalives 2 1 Route Refresh 0 0 Total 5 3 Update statistics: Sent Received Updates 0 0 Withdraws 0 0 Last error: HoldTimer expired