On Sat, Jul 16, 2022 at 12:41:07PM +0200, Claudio Jeker wrote:
> I deployed bgpd on one of more core routers and triggered the fatal
> "bad dmetric in decision process" from time to time.
> 
> I realized after a longer debugging session that one reason this happens
> is when nexthops become valid. The state change affects all prefixes at
> once but then they are reevaluated one by one (see prefix_evaluate_all()
> which is called by nexthop_runner()).
> 
> I currently have no good solution for this issue. I think the problem is
> that invalid prefixes are not sorted when added. There may be a similar
> issue when flipping a rib from no-evaluate to evaluate in the reload code.
> 
> For now neuter the fatalx and convert it to a log_debug() until I figured
> out a proper fix.

ok.

Now that the scope_id is part of struct bgpd_addr, the XXX in
pt_getaddr() can go.

> -- 
> :wq Claudio
> 
> Index: rde_decide.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/bgpd/rde_decide.c,v
> retrieving revision 1.95
> diff -u -p -r1.95 rde_decide.c
> --- rde_decide.c      11 Jul 2022 16:46:41 -0000      1.95
> +++ rde_decide.c      16 Jul 2022 10:28:19 -0000
> @@ -331,8 +331,12 @@ prefix_set_dmetric(struct prefix *pp, st
>                           PREFIX_DMETRIC_BEST : PREFIX_DMETRIC_INVALID;
>               else
>                       np->dmetric = prefix_cmp(pp, np, &testall);
> -             if (np->dmetric < 0)
> -                     fatalx("bad dmetric in decision process");
> +             if (np->dmetric < 0) {
> +                     struct bgpd_addr addr;
> +                     pt_getaddr(np->pt, &addr);
> +                     log_debug("bad dmetric in decision process: %s/%u",
> +                         log_addr(&addr), np->pt->prefixlen);
> +             }
>       }
>  }
>  
> 

Reply via email to