On Sun, May 07, 2023 at 04:55:05PM +0200, Marko Cupać wrote: > Hi, > > I have a pair of CARPed firewalls which have been talking BGP to two > upstream ISPs for years over numerous OpenBSD releases. > > After I upgraded to 7.3 a few days ago, as well as applied 001_bgpd > errata patch, I noticed bgpd died on both CARP members almost at the > same time, leaving my AS inaccessible from the Internet. > > I am pasting error messages, dmesg and bgpd.conf from both CARP > members, in hope someone can help me. I redacted some sensitive info > like public addresses and ISP names. > > Thank you in advance. > > bgp1 daemon log excerpt: > > May 7 12:12:57 bgp1 bgpd[36608]: neighbor 192.0.2.57 (ISP1): sending > notification: HoldTimer expired > May 7 12:12:57 bgp1 bgpd[36608]: neighbor 192.0.2.57 (ISP1): state change > Established -> Idle, reason: HoldTimer expired > May 7 12:13:27 bgp1 bgpd[36608]: neighbor 192.0.2.57 (ISP1): state change > Idle -> Connect, reason: Start > May 7 12:14:42 bgp1 bgpd[36608]: neighbor 192.0.2.57 (ISP1): state change > Connect -> Active, reason: Connection open failed > May 7 13:57:09 bgp1 bgpd[36608]: neighbor 192.0.2.57 (ISP1): state change > Active -> OpenSent, reason: Connection opened > May 7 13:57:09 bgp1 bgpd[36608]: neighbor 192.0.2.57 (ISP1): state change > OpenSent -> Active, reason: Connection closed > May 7 13:57:57 bgp1 bgpd[36608]: neighbor 192.0.2.57 (ISP1): state change > Active -> OpenSent, reason: Connection opened > May 7 13:57:57 bgp1 bgpd[36608]: neighbor 192.0.2.57 (ISP1): state change > OpenSent -> OpenConfirm, reason: OPEN message received > May 7 13:57:57 bgp1 bgpd[36608]: neighbor 192.0.2.57 (ISP1): state change > OpenConfirm -> Established, reason: KEEPALIVE message received > May 7 13:58:46 bgp1 bgpd[52951]: fatal in RDE: nexthop_unref: refcnt error > May 7 13:58:46 bgp1 bgpd[90623]: peer closed imsg connection > May 7 13:58:46 bgp1 bgpd[36608]: peer closed imsg connection > May 7 13:58:46 bgp1 bgpd[36608]: SE: Lost connection to RDE > May 7 13:58:46 bgp1 bgpd[90623]: main: Lost connection to RDE > May 7 13:58:46 bgp1 bgpd[36608]: peer closed imsg connection > May 7 13:58:46 bgp1 bgpd[36608]: SE: Lost connection to RDE control > May 7 13:58:46 bgp1 bgpd[36608]: peer closed imsg connection > May 7 13:58:46 bgp1 bgpd[36608]: SE: Lost connection to parent > May 7 13:58:46 bgp1 bgpd[36608]: neighbor 192.0.2.57 (ISP1): sending > notification: Cease, administratively down > May 7 13:58:46 bgp1 bgpd[73254]: peer closed imsg connection > May 7 13:58:46 bgp1 bgpd[73254]: RTR: Lost connection to RDE > May 7 13:58:46 bgp1 bgpd[73254]: peer closed imsg connection > May 7 13:58:46 bgp1 bgpd[36608]: neighbor 192.0.2.57 (ISP1): state change > Established -> Idle, reason: Stop > May 7 13:58:46 bgp1 bgpd[73254]: fatal in RTR: Lost connection to parent > May 7 13:58:46 bgp1 bgpd[36608]: neighbor 192.0.66.121 (ISP2): sending > notification: Cease, administratively down > May 7 13:58:46 bgp1 bgpd[36608]: neighbor 192.0.66.121 (ISP2): state change > Established -> Idle, reason: Stop > May 7 13:58:46 bgp1 bgpd[36608]: session engine exiting > May 7 13:58:55 bgp1 bgpd[90623]: kernel routing table 0 (Loc-RIB) decoupled > May 7 13:58:55 bgp1 bgpd[90623]: terminating > > bgp2 daemon log excerpt: > > May 7 12:12:38 bgp2 bgpd[60357]: neighbor 192.0.2.57 (ISP1): sending > notification: HoldTimer expired > May 7 12:12:38 bgp2 bgpd[60357]: neighbor 192.0.2.57 (ISP1): state change > Established -> Idle, reason: HoldTimer expired > May 7 12:13:08 bgp2 bgpd[60357]: neighbor 192.0.2.57 (ISP1): state change > Idle -> Connect, reason: Start > May 7 12:14:23 bgp2 bgpd[60357]: neighbor 192.0.2.57 (ISP1): state change > Connect -> Active, reason: Connection open failed > May 7 12:37:22 bgp2 ntpd[89855]: peer 217.24.20.5 now invalid > May 7 12:50:11 bgp2 ntpd[89855]: peer 217.24.20.5 now valid > May 7 13:57:08 bgp2 bgpd[60357]: neighbor 192.0.2.57 (ISP1): state change > Connect -> OpenSent, reason: Connection opened > May 7 13:57:09 bgp2 bgpd[60357]: neighbor 192.0.2.57 (ISP1): state change > OpenSent -> OpenConfirm, reason: OPEN message received > May 7 13:57:09 bgp2 bgpd[60357]: neighbor 192.0.2.57 (ISP1): state change > OpenConfirm -> Established, reason: KEEPALIVE message received > May 7 13:57:09 bgp2 bgpd[77822]: peer closed imsg connection > May 7 13:57:09 bgp2 bgpd[77822]: main: Lost connection to RDE > May 7 13:57:09 bgp2 bgpd[77520]: peer closed imsg connection > May 7 13:57:09 bgp2 bgpd[77520]: RTR: Lost connection to RDE > May 7 13:57:09 bgp2 bgpd[77520]: peer closed imsg connection > May 7 13:57:09 bgp2 bgpd[77520]: fatal in RTR: Lost connection to parent > May 7 13:57:09 bgp2 bgpd[60357]: peer closed imsg connection > May 7 13:57:09 bgp2 bgpd[60357]: SE: Lost connection to RDE > May 7 13:57:09 bgp2 bgpd[60357]: peer closed imsg connection > May 7 13:57:09 bgp2 bgpd[60357]: SE: Lost connection to parent > May 7 13:57:09 bgp2 bgpd[60357]: neighbor 192.0.2.57 (ISP1): sending > notification: Cease, administratively down > May 7 13:57:09 bgp2 bgpd[60357]: neighbor 192.0.2.57 (ISP1): state change > Established -> Idle, reason: Stop > May 7 13:57:09 bgp2 bgpd[60357]: neighbor 192.0.66.121 (ISP2): sending > notification: Cease, administratively down > May 7 13:57:09 bgp2 bgpd[60357]: neighbor 192.0.66.121 (ISP2): state change > Established -> Idle, reason: Stop > May 7 13:57:09 bgp2 bgpd[60357]: session engine exiting > May 7 13:57:17 bgp2 bgpd[77822]: kernel routing table 0 (Loc-RIB) decoupled > May 7 13:57:18 bgp2 bgpd[77822]: route decision engine terminated; signal 11 > May 7 13:57:18 bgp2 bgpd[77822]: terminating >
Please give the following diff a try and report back if it fixes your issue. When copying the output filters the refcnt for your match out to XYZ set nexthop BLA rule is not properly increased and when a peer flaps the counts are off and probably trigger both the crash and especially the fatal message. -- :wq Claudio Index: rde_filter.c =================================================================== RCS file: /cvs/src/usr.sbin/bgpd/rde_filter.c,v retrieving revision 1.135 diff -u -p -r1.135 rde_filter.c --- rde_filter.c 19 Apr 2023 13:23:33 -0000 1.135 +++ rde_filter.c 7 May 2023 16:48:44 -0000 @@ -583,6 +583,12 @@ filterset_copy(struct filter_set_head *s if ((t = malloc(sizeof(struct filter_set))) == NULL) fatal(NULL); memcpy(t, s, sizeof(struct filter_set)); + if (t->type == ACTION_RTLABEL_ID) + rtlabel_ref(t->action.id); + else if (t->type == ACTION_PFTABLE_ID) + pftable_ref(t->action.id); + else if (t->type == ACTION_SET_NEXTHOP_REF) + nexthop_ref(t->action.nh_ref); TAILQ_INSERT_TAIL(dest, t, entry); } }