Hello,

We've recently begun testing using OpenBSD 4.4 with OpenBGP in our datacenter.
Our initial tests have uncovered an odd issue we hope you all can help us
with.  I've included our configs and relevant information below.

The summary of our issue is this:

1.) Upon starting bgpd the session between the two routers goes to established
and updates are passed.
2.) Keepalives aren't passed beyond the first exchange.
3.) After some time, the session goes to IDLE on both routers.
4.) The session tears down if we either issue a bgpctl command (like show
summary or show neighbors) or wait 240 seconds after the initial connect.
5.) The routers then reestablish connections but they drop again.
6.) The exact same setup works fine with OpenBGP 4.3.

Here's what we've found.  If we modify session.c at line 405 (timeout = 240;
/* loop every 240s at least */) to some number lower than our holdtime, it
works.   Adding debugging code to the code after that line shows us that the
code doesn't get processed again after the intial setup unless the timeout
value is reached or some bgpctl statement is excecuted.

We've replicated this error in two different test environments.  The error
causes sessions to be torn down anytime a 4.4 bgpd is used. (ie 4.4 -> 4.4 and
4.4 -> 4.3).

Please let me know if you need any additional information from me.

Thanks so much,

Marc Runkel
Technical Operations Manger
Untangle, Inc.


The two machines in question are dcrouter1 and bgptest2:

dcrouter1:/etc/bgpd.conf

#macros
# XO Peer
XOpeer="65.46.252.33"

# global configuration
AS 21634
router-id 65.46.252.34
log updates
network 64.2.3.0/24
holdtime min 3
holdtime 90

# neighbors and peers
neighbor $XOpeer {
        remote-as       2828
        descr           XO Upstream
        local-address   65.46.252.34
        multihop        2
}


# filter out prefixes longer than 24 or shorter than 8 bits
deny from any
allow from any inet prefixlen 8 - 24

# do not accept a default route
deny from any prefix 0.0.0.0/0


# We're in test mode, so we gotta let the test networks in (192.168.0.0/16).

# filter bogus networks
deny from any prefix 10.0.0.0/8 prefixlen >= 8
deny from any prefix 172.16.0.0/12 prefixlen >= 12
#deny from any prefix 192.168.0.0/16 prefixlen >= 16
deny from any prefix 169.254.0.0/16 prefixlen >= 16
deny from any prefix 192.0.2.0/24 prefixlen >= 24
deny from any prefix 224.0.0.0/4 prefixlen >= 4
deny from any prefix 240.0.0.0/4 prefixlen >= 4

-------------- END --------------
dcrouter1:/etc/hostname.em0

inet 65.46.252.34 255.255.255.252 65.46.252.35 description "XO WAN"

-------------- END --------------
dcrouter1:/var/log/daemon.log (bgpd only)

Jan 20 11:19:51 dcrouter1 bgpd[24217]: startup
Jan 20 11:19:51 dcrouter1 bgpd[14770]: route decision engine ready
Jan 20 11:19:52 dcrouter1 bgpd[5962]: listening on 0.0.0.0
Jan 20 11:19:52 dcrouter1 bgpd[5962]: listening on ::
Jan 20 11:19:52 dcrouter1 bgpd[5962]: session engine ready
Jan 20 11:19:52 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream):
state change None -> Idle, reason: None
Jan 20 11:19:52 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream):
state change Idle -> Connect, reason: Start
Jan 20 11:19:52 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream):
socket error: Connection refused
Jan 20 11:19:52 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream):
state change Connect -> Active, reason: Connection open failed
Jan 20 11:19:56 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream):
state change Active -> OpenSent, reason: Connection opened
Jan 20 11:19:56 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream):
state change OpenSent -> OpenConfirm, reason: OPEN message received
Jan 20 11:19:56 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream):
state change OpenConfirm -> Established, reason: KEEPALIVE message received
Jan 20 11:19:56 dcrouter1 bgpd[14770]: neighbor 65.46.252.33 (XO Upstream)
AS2828: update 192.168.42.0/24 via 65.46.252.33
Jan 20 11:19:56 dcrouter1 bgpd[24217]: nexthop 65.46.252.33 now valid:
directly connected
Jan 20 11:20:44 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream):
received notification: HoldTimer expired, unknown subcode 0
Jan 20 11:20:44 dcrouter1 bgpd[5962]: neighbor 65.46.252.33 (XO Upstream):
state change Established -> Idle, reason: NOTIFICATION received

-------------- END --------------
dcrouter1:tcpdump -vvns1500 -i em0 port 179

tcpdump: listening on em0, link-type EN10MB
11:19:52.537633 65.46.252.34.48310 > 65.46.252.33.179: S [tcp sum ok]
1642888815:1642888815(0) win 16384 <mss 1460,nop,nop,sackOK,nop,wscale
0,nop,nop,timestamp 2322120143 0> (DF) [tos 0xc0] (ttl 2, id 23223, len 64)
11:19:52.537747 65.46.252.33.179 > 65.46.252.34.48310: R [tcp sum ok] 0:0(0)
ack 1642888816 win 0 (DF) (ttl 64, id 40395, len 40)11:19:56.759172
65.46.252.33.1985 > 65.46.252.34.179: S [tcp sum ok] 2516427034:2516427034(0)
win 16384 <mss 1460,nop,nop,sackOK,nop,wscale 0,nop,nop,timestamp 1931362699
0> (DF) [tos 0xc0] (ttl 2, id 61323, len 64)
11:19:56.759201 65.46.252.34.179 > 65.46.252.33.1985: S [tcp sum ok]
2812695705:2812695705(0) ack 2516427035 win 16384 <mss 1460,nop,wscale
0,nop,nop,timestamp 443534217 1931362699> (DF) (ttl 255, id 42087, len 60)
11:19:56.759298 65.46.252.33.1985 > 65.46.252.34.179: . [tcp sum ok] 1:1(0)
ack 1 win 16384 <nop,nop,timestamp 1931362699 443534217> (DF) [tos 0xc0] (ttl
2, id 39909, len 52)
11:19:56.759553 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok] 1:42(41)
ack 1 win 16384 <nop,nop,timestamp 1931362699 443534217>: BGP (OPEN: Version
4, AS #2828, Holdtime 15, ID 65.46.252.33, Option length 12 (option
Capabilities Advertisement, len=6) (option Capabilities Advertisement, len=2))
(DF) [tos 0xc0] (ttl 2, id 23479, len 93)
11:19:56.759661 65.46.252.34.179 > 65.46.252.33.1985: P [tcp sum ok] 1:42(41)
ack 42 win 17335 <nop,nop,timestamp 443534217 1931362699>: BGP (OPEN: Version
4, AS #21634, Holdtime 90, ID 65.46.252.34, Option length 12 (option
Capabilities Advertisement, len=6) (option Capabilities Advertisement, len=2))
(DF) [tos 0xc0] (ttl 2, id 9726, len 93)
11:19:56.759739 65.46.252.34.179 > 65.46.252.33.1985: P [tcp sum ok] 42:61(19)
ack 42 win 17376 <nop,nop,timestamp 443534217 1931362699>: BGP (KEEPALIVE)
(DF) [tos 0xc0] (ttl 2, id 45368, len 71)11:19:56.759922 65.46.252.33.1985 >
65.46.252.34.179: . [tcp sum ok] 42:42(0) ack 61 win 16365 <nop,nop,timestamp
1931362699 443534217> (DF) [tos 0xc0] (ttl 2, id 446, len 52)
11:19:56.760045 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok] 42:61(19)
ack 61 win 16365 <nop,nop,timestamp 1931362699 443534217>: BGP (KEEPALIVE)
(DF) [tos 0xc0] (ttl 2, id 60740, len 71)
11:19:56.760426 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok]
61:106(45) ack 61 win 16384 <nop,nop,timestamp 1931362699 443534217>: BGP
(UPDATE: (Path attributes: (ORIGIN[T] IGP)
                (AS_PATH[T] 2828)                (NEXT_HOP[T] 65.46.252.33))
                (NLRI: 192.168.42.0/24)) (DF) [tos 0xc0] (ttl 2, id 26775, len
97)11:19:56.760453 65.46.252.34.179 > 65.46.252.33.1985: . [tcp sum ok]
61:61(0) ack 106 win 17331 <nop,nop,timestamp 443534217 1931362699> (DF) [tos
0xc0] (ttl 2, id 18859, len 52)11:19:56.760751 65.46.252.34.179 >
65.46.252.33.1985: P [tcp sum ok] 61:106(45) ack 106 win 17376
<nop,nop,timestamp 443534217 1931362699>: BGP (UPDATE: (Path attributes:
(ORIGIN[T] IGP)
                (AS_PATH[T] 21634)                (NEXT_HOP[T] 65.46.252.34))
                (NLRI: 64.2.3.0/24)) (DF) [tos 0xc0] (ttl 2, id 31180, len
97)11:19:56.952678 65.46.252.33.1985 > 65.46.252.34.179: . [tcp sum ok]
106:106(0) ack 106 win 16384 <nop,nop,timestamp 1931362700 443534217> (DF)
[tos 0xc0] (ttl 2, id 39193, len 52)11:20:44.422472 65.46.252.33.1985 >
65.46.252.34.179: P [tcp sum ok] 106:125(19) ack 106 win 16384
<nop,nop,timestamp 1931362795 443534217>: BGP (KEEPALIVE) (DF) [tos 0xc0] (ttl
2, id 24587, len 71)11:20:44.422482 65.46.252.33.1985 > 65.46.252.34.179: P
[tcp sum ok] 125:146(21) ack 106 win 16384 <nop,nop,timestamp 1931362795
443534217>: BGP (NOTIFICATION: error Hold Timer Expired, subcode #0) (DF) [tos
0xc0] (ttl 2, id 48190, len 73)
11:20:44.422489 65.46.252.33.1985 > 65.46.252.34.179: F [tcp sum ok]
146:146(0) ack 106 win 16384 <nop,nop,timestamp 1931362795 443534217> (DF)
[tos 0xc0] (ttl 2, id 25026, len 52)
11:20:44.422526 65.46.252.34.179 > 65.46.252.33.1985: . [tcp sum ok]
106:106(0) ack 146 win 17336 <nop,nop,timestamp 443534312 1931362795> (DF)
[tos 0xc0] (ttl 2, id 15154, len 52)
11:20:44.422532 65.46.252.34.179 > 65.46.252.33.1985: . [tcp sum ok]
106:106(0) ack 147 win 17336 <nop,nop,timestamp 443534312 1931362795> (DF)
[tos 0xc0] (ttl 2, id 3469, len 52)
11:20:44.422821 65.46.252.34.179 > 65.46.252.33.1985: F [tcp sum ok]
106:106(0) ack 147 win 17376 <nop,nop,timestamp 443534312 1931362795> (DF)
[tos 0xc0] (ttl 2, id 2668, len 52)
11:20:44.422972 65.46.252.33.1985 > 65.46.252.34.179: . [tcp sum ok]
147:147(0) ack 107 win 16384 <nop,nop,timestamp 1931362795 443534312> (DF)
[tos 0xc0] (ttl 2, id 4160, len 52)

-------------- END --------------
bgptest2:/etc/bgpd.conf

#macros
# Untangle
peer2="65.46.252.34"
# global configuration
AS 2828
router-id 65.46.252.33
log updates
network 192.168.42.0/24
holdtime min 3
holdtime 15

# neighbors and peers

neighbor $peer2 {
        remote-as       21634
        descr           Untangle Datacenter
        local-address   65.46.252.33
        multihop        2
}

# filter out prefixes longer than 24 or shorter than 8 bits
deny from any
allow from any inet prefixlen 8 - 24

# do not accept a default route
deny from any prefix 0.0.0.0/0


# We're in test mode, so we gotta let the test networks in (192.168.0.0/16).

# filter bogus networks
deny from any prefix 10.0.0.0/8 prefixlen >= 8
deny from any prefix 172.16.0.0/12 prefixlen >= 12
#deny from any prefix 192.168.0.0/16 prefixlen >= 16
deny from any prefix 169.254.0.0/16 prefixlen >= 16
deny from any prefix 192.0.2.0/24 prefixlen >= 24
deny from any prefix 224.0.0.0/4 prefixlen >= 4
deny from any prefix 240.0.0.0/4 prefixlen >= 4

-------------- END --------------
bgptest2:/etc/hostname.fxp0

dhcp NONE NONE NONE
inet alias 65.46.252.33 255.255.255.252 65.46.252.35 description "XO
Datacenter"dhcp NONE NONE NONE

-------------- END --------------
bgptest2:ifconfig -A

lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33204
        groups: lo
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
fxp0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:11:11:cf:09:f4
        description: XO Datacenter
        groups: egress
        media: Ethernet autoselect (100baseTX full-duplex)
        status: active
        inet6 fe80::211:11ff:fecf:9f4%fxp0 prefixlen 64 scopeid 0x1
        inet 10.0.0.144 netmask 0xff000000 broadcast 10.255.255.255
        inet 65.46.252.33 netmask 0xfffffffc broadcast 65.46.252.35
enc0: flags=0<> mtu 1536

-------------- END --------------
bgptest2:/var/log/daemon.log (bgpd only)

Jan 20 11:19:56 bgptest2 bgpd[14506]: startup
Jan 20 11:19:56 bgptest2 bgpd[17995]: route decision engine ready
Jan 20 11:19:56 bgptest2 bgpd[29940]: listening on 0.0.0.0
Jan 20 11:19:56 bgptest2 bgpd[29940]: listening on ::
Jan 20 11:19:56 bgptest2 bgpd[29940]: session engine ready
Jan 20 11:19:56 bgptest2 bgpd[29940]: neighbor 65.46.252.34 (Untangle
Datacenter): state change None -> Idle, reason: None
Jan 20 11:19:56 bgptest2 bgpd[29940]: neighbor 65.46.252.34 (Untangle
Datacenter): state change Idle -> Connect, reason: Start
Jan 20 11:19:56 bgptest2 bgpd[29940]: neighbor 65.46.252.34 (Untangle
Datacenter): state change Connect -> OpenSent, reason: Connection opened
Jan 20 11:19:56 bgptest2 bgpd[29940]: neighbor 65.46.252.34 (Untangle
Datacenter): state change OpenSent -> OpenConfirm, reason: OPEN message
received
Jan 20 11:19:56 bgptest2 bgpd[29940]: neighbor 65.46.252.34 (Untangle
Datacenter): state change OpenConfirm -> Established, reason: KEEPALIVE
message received
Jan 20 11:19:56 bgptest2 bgpd[17995]: neighbor 65.46.252.34 (Untangle
Datacenter) AS21634: update 64.2.3.0/24 via 65.46.252.34
Jan 20 11:19:56 bgptest2 bgpd[14506]: nexthop 65.46.252.34 now valid: directly
connected
Jan 20 11:20:44 bgptest2 bgpd[29940]: neighbor 65.46.252.34 (Untangle
Datacenter): state change Established -> Idle, reason: HoldTimer expired

-------------- END --------------
bgptest2:tcpdump -vvns1500 -i fxp0 port 179

tcpdump: listening on fxp0, link-type EN10MB
11:19:52.527328 65.46.252.34.48310 > 65.46.252.33.179: S [tcp sum ok]
1642888815:1642888815(0) win 16384 <mss 1460,nop,nop,sackOK,nop,wscale
0,nop,nop,timestamp 2322120143 0> (DF) [tos 0xc0] (ttl 2, id 23223, len 64)
11:19:52.527348 65.46.252.33.179 > 65.46.252.34.48310: R [tcp sum ok] 0:0(0)
ack 1642888816 win 0 (DF) (ttl 64, id 40395, len 40)
11:19:56.748739 65.46.252.33.1985 > 65.46.252.34.179: S [tcp sum ok]
2516427034:2516427034(0) win 16384 <mss 1460,nop,nop,sackOK,nop,wscale
0,nop,nop,timestamp 1931362699 0> (DF) [tos 0xc0] (ttl 2, id 61323, len 64)
11:19:56.748896 65.46.252.34.179 > 65.46.252.33.1985: S [tcp sum ok]
2812695705:2812695705(0) ack 2516427035 win 16384 <mss 1460,nop,wscale
0,nop,nop,timestamp 443534217 1931362699> (DF) (ttl 255, id 42087, len 60)
11:19:56.748925 65.46.252.33.1985 > 65.46.252.34.179: . [tcp sum ok] 1:1(0)
ack 1 win 16384 <nop,nop,timestamp 1931362699 443534217> (DF) [tos 0xc0] (ttl
2, id 39909, len 52)
11:19:56.749147 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok] 1:42(41)
ack 1 win 16384 <nop,nop,timestamp 1931362699 443534217>: BGP (OPEN: Version
4, AS #2828, Holdtime 15, ID 65.46.252.33, Option length 12 (option
Capabilities Advertisement, len=6) (option Capabilities Advertisement, len=2))
(DF) [tos 0xc0] (ttl 2, id 23479, len 93)
11:19:56.749358 65.46.252.34.179 > 65.46.252.33.1985: P [tcp sum ok] 1:42(41)
ack 42 win 17335 <nop,nop,timestamp 443534217 1931362699>: BGP (OPEN: Version
4, AS #21634, Holdtime 90, ID 65.46.252.34, Option length 12 (option
Capabilities Advertisement, len=6) (option Capabilities Advertisement, len=2))
(DF) [tos 0xc0] (ttl 2, id 9726, len 93)
11:19:56.749433 65.46.252.34.179 > 65.46.252.33.1985: P [tcp sum ok] 42:61(19)
ack 42 win 17376 <nop,nop,timestamp 443534217 1931362699>: BGP (KEEPALIVE)
(DF) [tos 0xc0] (ttl 2, id 45368, len 71)
11:19:56.749455 65.46.252.33.1985 > 65.46.252.34.179: . [tcp sum ok] 42:42(0)
ack 61 win 16365 <nop,nop,timestamp 1931362699 443534217> (DF) [tos 0xc0] (ttl
2, id 446, len 52)
11:19:56.749617 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok] 42:61(19)
ack 61 win 16365 <nop,nop,timestamp 1931362699 443534217>: BGP (KEEPALIVE)
(DF) [tos 0xc0] (ttl 2, id 60740, len 71)
11:19:56.749956 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok]
61:106(45) ack 61 win 16384 <nop,nop,timestamp 1931362699 443534217>: BGP
(UPDATE: (Path attributes: (ORIGIN[T] IGP)
                (AS_PATH[T] 2828)
                (NEXT_HOP[T] 65.46.252.33))
                (NLRI: 192.168.42.0/24)) (DF) [tos 0xc0] (ttl 2, id 26775, len
97)
11:19:56.750146 65.46.252.34.179 > 65.46.252.33.1985: . [tcp sum ok] 61:61(0)
ack 106 win 17331 <nop,nop,timestamp 443534217 1931362699> (DF) [tos 0xc0]
(ttl 2, id 18859, len 52)
11:19:56.750447 65.46.252.34.179 > 65.46.252.33.1985: P [tcp sum ok]
61:106(45) ack 106 win 17376 <nop,nop,timestamp 443534217 1931362699>: BGP
(UPDATE: (Path attributes: (ORIGIN[T] IGP)
                (AS_PATH[T] 21634)
                (NEXT_HOP[T] 65.46.252.34))
                (NLRI: 64.2.3.0/24)) (DF) [tos 0xc0] (ttl 2, id 31180, len
97)
11:19:56.942182 65.46.252.33.1985 > 65.46.252.34.179: . [tcp sum ok]
106:106(0) ack 106 win 16384 <nop,nop,timestamp 1931362700 443534217> (DF)
[tos 0xc0] (ttl 2, id 39193, len 52)
11:20:44.411990 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok]
106:125(19) ack 106 win 16384 <nop,nop,timestamp 1931362795 443534217>: BGP
(KEEPALIVE) (DF) [tos 0xc0] (ttl 2, id 24587, len 71)
11:20:44.412013 65.46.252.33.1985 > 65.46.252.34.179: P [tcp sum ok]
125:146(21) ack 106 win 16384 <nop,nop,timestamp 1931362795 443534217>: BGP
(NOTIFICATION: error Hold Timer Expired, subcode #0) (DF) [tos 0xc0] (ttl 2,
id 48190, len 73)
11:20:44.412052 65.46.252.33.1985 > 65.46.252.34.179: F [tcp sum ok]
146:146(0) ack 106 win 16384 <nop,nop,timestamp 1931362795 443534217> (DF)
[tos 0xc0] (ttl 2, id 25026, len 52)
11:20:44.412211 65.46.252.34.179 > 65.46.252.33.1985: . [tcp sum ok]
106:106(0) ack 146 win 17336 <nop,nop,timestamp 443534312 1931362795> (DF)
[tos 0xc0] (ttl 2, id 15154, len 52)
11:20:44.412222 65.46.252.34.179 > 65.46.252.33.1985: . [tcp sum ok]
106:106(0) ack 147 win 17336 <nop,nop,timestamp 443534312 1931362795> (DF)
[tos 0xc0] (ttl 2, id 3469, len 52)
11:20:44.412500 65.46.252.34.179 > 65.46.252.33.1985: F [tcp sum ok]
106:106(0) ack 147 win 17376 <nop,nop,timestamp 443534312 1931362795> (DF)
[tos 0xc0] (ttl 2, id 2668, len 52)
11:20:44.412527 65.46.252.33.1985 > 65.46.252.34.179: . [tcp sum ok]
147:147(0) ack 107 win 16384 <nop,nop,timestamp 1931362795 443534312> (DF)
[tos 0xc0] (ttl 2, id 4160, len 52)

-------------- END --------------
bgptest2:bgpctl show neighbors @ 11:20:44

bgptest2:/etc#bgpctl show nei
BGP neighbor is 65.46.252.34, remote AS 21634
 Description: Untangle Datacenter
  BGP version 4, remote router-id 65.46.252.34
  BGP state = Idle, down for 00:00:00
  Last read 00:00:48, holdtime 240s, keepalive interval 80s

  Message statistics:
                  Sent       Received
  Opens                    1          1
  Notifications            1          0
  Updates                  1          1
  Keepalives               2          1
  Route Refresh            0          0
  Total                    5          3

  Update statistics:
                  Sent       Received
  Updates                  0          0
  Withdraws                0          0

  Last error: HoldTimer expired

Reply via email to