On 10 January 2017 at 19:58, Job Snijders <j...@instituut.net> wrote: > On Tue, Jan 10, 2017 at 03:51:04AM +0100, Baldur Norddahl wrote: >> If a transit link goes, for example because we had to reboot a router, >> traffic is supposed to reroute to the remaining transit links. >> Internally our network handles this fairly fast for egress traffic. >> >> However the problem is the ingress traffic - it can be 5 to 15 minutes >> before everything has settled down. This is the time before everyone >> else on the internet has processed that they will have to switch to >> your alternate transit. >> >> The only solution I know of is to have redundant links to all transits. > > Alternatively, if you reboot a router, perhaps you could first shutdown > the eBGP sessions, then wait 5 to 10 minutes for the traffic to drain > away (should be visible in your NMS stats), and then proceed with the > maintenance? > > Of course this only works for planned reboots, not suprise reboots. > > Kind regards, > > Job
If I tear down my eBGP sessions the upstream router withdraws the route and the traffic just stops. Are your upstreams propagating withdraws without actually updating their own routing tables? I believe the simple explanation of the problem can be seen by firing up an inbound mtr from a distant network then withdrawing the route from the path it is taking. It should show either destination unreachable or a routing loop which "retreats" (under the right circumstances I have observed it distinctly move 1 hop at a time) until it finds an alternate path. My observed convergence times for a single withdraw are however in the sub-10 second range, to get all the networks in the original path pointing at a new one. My view on the problem is that if you are failing over frequently enough for a customer to notice and report it, you have bigger problems than convergence times. - Mike Jones