I finally got round to running your latest dev version. One problem that sticks out is that every RIB change results in a FIB change.
To see why this is a big deal, imagine you learn routes for network Y via transit X, which you are connected to through router A and B: Y ---> X --> A ---> host | | ----> B ------- If Y prepends their BGP announcement, the update will not reach A and B at the same time, and therefore the host will see the prepend from each one after another. In the interim, it will be fooled into thinking that there is a new best path, and modify the FIB to reflect this, only to then receive the second update and have a tied route again. Running `ip monitor` you'll get to see a ton of route 'changes' which don't look any different: Deleted 184.51.158.0/24 proto bird nexthop via 130.94.A.A dev p1p4 weight 1 nexthop via 130.94.B.B dev p6p1 weight 1 184.51.158.0/24 proto bird nexthop via 130.94.A.A dev p1p4 weight 1 nexthop via 130.94.B.B dev p6p1 weight 1 Even when the FIB should change, you end up with sequences of deletes / adds which mirror the order of BGP updates. One way of solving this is to batch route changes by delaying route injection, otherwise the route churn is too high and linux starts doing a lot of nexthop invalidation when you inject multiple full routing tables into the FIB. Cheers, - j On Sun, Jun 7, 2015 at 6:04 PM, João Taveira Araújo <joao.tave...@gmail.com> wrote: > In our hack around this we (Fastly) ended up adding a bgp_rte_same > with pretty much everything you mention. > > One non-obvious addition is that we ended up enforcing that the > multipath entry had the same next AS, i.e bgp_get_neighbor(new) == > bgp_get_neighbor(old). With nothing else to tie break, we'd end up > getting next hops towards the same prefix over different carriers. The > problem with this is that with a high degree of route churn we'd get > next hop invalidation, in which case a flow going over one carrier > would flap onto another mid-flight, which had performance implications > for users. > > We ended up enforcing this in our selection policy but it should > arguably be optional (strict mode). > > > On Sun, Jun 7, 2015 at 5:43 PM, Ondrej Zajicek <santi...@crfreenet.org> wrote: >> On Fri, May 22, 2015 at 12:13:31PM +0200, Alexander Frolkin wrote: >>> Hi Ondrej, >>> >>> > > I was wondering how hard it would be to add BGP multipath support to >>> > > BIRD, or if anyone was working on it already? >>> > BGP multipath is one thing we are currently working on. >>> >>> That's great news! Do you know when it's likely to be available? >> >> Hi >> >> There is devel version of BGP multipath in our Git. Currently it allows >> to merge routes that have the same preference, bgp_local_pref, bgp_path >> length, bgp_origin, bgp_med (if relevant), ibgp/ebgp and igp_metric. >> >> As BGP multipath is non-standard, i wonder what kind of BGP multipath >> behavior is expected by users and which options are necessary. I will >> probably add some option to relax check for equal bgp_path length. >> >> -- >> Elen sila lumenn' omentielvo >> >> Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org) >> OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) >> "To err is human -- to blame it on a computer is even more so."