On Sat, Jul 16, 2022 at 12:41:07PM +0200, Claudio Jeker wrote: > I deployed bgpd on one of more core routers and triggered the fatal > "bad dmetric in decision process" from time to time. > > I realized after a longer debugging session that one reason this happens > is when nexthops become valid. The state change affects all prefixes at > once but then they are reevaluated one by one (see prefix_evaluate_all() > which is called by nexthop_runner()). > > I currently have no good solution for this issue. I think the problem is > that invalid prefixes are not sorted when added. There may be a similar > issue when flipping a rib from no-evaluate to evaluate in the reload code. > > For now neuter the fatalx and convert it to a log_debug() until I figured > out a proper fix.
ok. Now that the scope_id is part of struct bgpd_addr, the XXX in pt_getaddr() can go. > -- > :wq Claudio > > Index: rde_decide.c > =================================================================== > RCS file: /cvs/src/usr.sbin/bgpd/rde_decide.c,v > retrieving revision 1.95 > diff -u -p -r1.95 rde_decide.c > --- rde_decide.c 11 Jul 2022 16:46:41 -0000 1.95 > +++ rde_decide.c 16 Jul 2022 10:28:19 -0000 > @@ -331,8 +331,12 @@ prefix_set_dmetric(struct prefix *pp, st > PREFIX_DMETRIC_BEST : PREFIX_DMETRIC_INVALID; > else > np->dmetric = prefix_cmp(pp, np, &testall); > - if (np->dmetric < 0) > - fatalx("bad dmetric in decision process"); > + if (np->dmetric < 0) { > + struct bgpd_addr addr; > + pt_getaddr(np->pt, &addr); > + log_debug("bad dmetric in decision process: %s/%u", > + log_addr(&addr), np->pt->prefixlen); > + } > } > } > >