The below problem was the motivation for this BGP improvement : http://tools.ietf.org/html/draft-ietf-idr-bgp-bestpath-selection-criteria
-----Original Message----- From: Pete Lumbis <alum...@gmail.com> Date: Friday, October 25, 2013 2:01 PM To: JRC NOC <nospam-na...@jensenresearch.com> Cc: "nanog@nanog.org" <nanog@nanog.org> Subject: Re: BGP failure analysis and recommendations >As a member of the support team for a vendor, I'll say this problem isn't >entirely unheard of. The CPU is in charge of local traffic and the BGP >session and some sort of hardware chip or ASIC is in charge of moving >packets through the device. If the hardware is misprogrammed it won't >properly forward traffic while BGP thinks it's doing it's job. This is not >to be confused with a hardware failure. This is purely a software problem. >The software is responsible for telling the hardware what to do, and >sometimes there are bugs there, like there are bugs in all code. > >The easiest way to test this kind of issue is to have some other control >plane that is tied to the data plane. That is, the only way to make sure >that the peer is forwarding traffic is to make it forward traffic and >react >when it fails. You could do something like set up IP SLA (i.e., ping) to >something in that SP network. If the ping fails then it sounds like your >peer may have a forwarding issue and you can apply a policy to remove or >at >least not prefer that peer (in case it's a false positive). > >-Pete > > >On Wed, Oct 23, 2013 at 10:40 PM, JRC NOC ><nospam-na...@jensenresearch.com>wrote: > >> Hello Nanog - >> >> On Saturday, October 19th at about 13:00 UTC we experienced an IP >>failure >> at one of our sites in the New York area. >> It was apparently a widespread outage on the East coast, but I haven't >> seen it discussed here. >> >> We are multihomed, using EBGP to three (diverse) upstream providers. One >> provider experienced a hardware failure in a core component at one POP. >> Regrettably, during the outage our BGP session remained active and we >> continued receiving full routes from the affected AS. And our prefixes >> continued to be advertised at their border. However basically none of >>the >> traffic between those prefixes over that provider was delivered. The >>bogus >> routes stayed up for hours. We shutdown the BGP peering session when the >> nature of the problem became clear. This was effective. I believe that >>all >> customer BGP routes were similarly affected, including those belonging >>to >> some large regional networks and corporations. I have raised the >>questions >> below with the provider but haven't received any information or advice. >> >> My question is why did our BGP configuration fail? I'm guessing the >>basic >> answer is that the IGP and route reflectors within that provider were >>still >> connected, but the forwarding paths were unavailable. My BGP session >> basically acted like a bunch of static routes, with no awareness of the >> failure(s) and no dynamic reconfiguration of the RIB. >> >> Is this just an unavoidable issue with scaling large networks? >> Is it perhaps a known side effect of MPLS? >> Have we/they lost something important in the changeover to converged >> mutiprotocol networks? >> Is there a better way for us edge networks to achieve IP resiliency in >>the >> current environment? >> >> This is an operational issue. Thanks in advance for any hints about what >> happened or better practices to reduce the impact of a routine hardware >> fault in an upstream network. >> >> - Eric Jensen >> >> >> >> >> >> >> >> >> >> >> >> Date: Wed, 23 Oct 2013 21:26:43 -0400 >>> To: c...@chrisjensen.org >>> From: JRC NetOps <n...@jensenresearch.com> >>> Subject: Fwd: BGP failure analysis and recommendations >>> >>> >>> Date: Mon, 21 Oct 2013 23:19:28 -0400 >>>> To: christopher.sm...@level3.com >>>> From: Eric Jensen <ejen...@jensenresearch.com> >>>> Subject: BGP failure analysis and recommendations >>>> Cc: "Joe Budelis Fast-E.com" <j...@fast-e.com> >>>> Bcc: n...@jensenresearch.com >>>> >>>> Hello Christopher Smith - >>>> >>>> I left you a voicemail message today. The Customer Service folks also >>>> gave me your email address. >>>> >>>> We have a small, but high-value multi-homed corporate network. >>>> We operate using our AS number 17103. >>>> >>>> We have BGP transit circuits with Level 3, Lightpath, and at our colo >>>> center (AS8001) >>>> The Level 3 circuit ID is BBPM9946 >>>> >>>> On Saturday, October 19 2013 we had a large IP outage. I tracked it >>>>back >>>> to our Level 3 circuit and opened a ticket (7126634). >>>> I have copied (below) an email I sent our channel salesman with more >>>> details about our BGP problems during your outage. >>>> Briefly, I am very concerned that Level 3 presented routes to us that >>>> were not actually reachable through your network, and even worse >>>>Level 3 >>>> kept advertising our prefixes even though your network couldn't >>>>deliver the >>>> traffic to us for those prefixes. >>>> >>>> I believe that the BGP NLRI data should follow the same IP path as the >>>> forwarded data itself. Apparently this isn't the case at Level 3. >>>> I also believe that your MPLS backbone should have recovered >>>> automatically from the forwarding failure, but this didn't happen >>>>either. >>>> My only fix was to manually shutdown the BGP peering session with >>>>Level >>>> 3. >>>> >>>> Can you explain to me how Level 3 black-holed my routes? >>>> Can you suggest some change to our or your BGP configuration to >>>> eliminate this BGP failure mode? >>>> >>>> Just to be clear, I don't expect our circuit, or your network, to be >>>>up >>>> all the time. But I do expect that the routes you advertise to us and >>>>to >>>> your BGP peers actually be reachable through your network. On >>>>Saturday this >>>> didn't happen. The routes stayed up while the data transport was down. >>>> >>>> Our IPv4 BGP peering session with Level 3 remains down in the interim. >>>> Please get back to me as soon as possible. >>>> >>>> - Eric Jensen >>>> AS17103 >>>> 201-741-9509 >>>> >>>> >>>> >>>> Date: Mon, 21 Oct 2013 22:55:35 -0400 >>>>> To: "Joe Budelis Fast-E.com" <j...@fast-e.com> >>>>> From: Eric Jensen <ejen...@jensenresearch.com> >>>>> Subject: Re: Fwd: Level3 Interim Response >>>>> Bcc: n...@jensenresearch.com >>>>> >>>>> Hi Joe- >>>>> >>>>> Thanks for making the new inquiry. >>>>> This was a big outage. Apparently Time Warner Cable and Cablevision >>>>> were affected greatly. Plus many large corporate networks. And of >>>>>course >>>>> all the single-homed Level 3 customers worldwide. My little network >>>>>was >>>>> just one more casualty. >>>>> >>>>> See: >>>>> >>>>> >>>>>http://www.dslreports.com/**forum/r28749556-Internet-**Level3-Outage-< >>>>>http://www.dslreports.com/forum/r28749556-Internet-Level3-Outage-> >>>>> >>>>> >>>>>http://online.wsj.com/news/**articles/**SB1000142405270230486450457914 >>>>>* >>>>> >>>>>*5813698584246<http://online.wsj.com/news/articles/SB10001424052702304 >>>>>864504579145813698584246> >>>>> >>>>> For our site, the massive outage started at about 9:00 am Saturday >>>>>and >>>>> lasted until after 2:00PM. I opened a ticket about 9:30 am but only >>>>> realized the routing issues and took down our BGP session about >>>>>12:00 to >>>>> try to minimize the problems for our traffic caused by their >>>>>misconfigured >>>>> BGP. >>>>> >>>>> There can always be equipment failures and fiber cuts. That's not the >>>>> problem. >>>>> From my point of view the problem was/is that Level 3 kept >>>>> "advertising" our prefix but couldn't deliver the packets to us. >>>>>They did >>>>> this for all their customer's prefixes, thereby sucking in about >>>>>half the >>>>> NYC area internet traffic and dumping into the Hudson River, for a >>>>>huge >>>>> period of time. >>>>> They also kept advertising all their BGP routes to me, thereby >>>>>fooling >>>>> my routers into sending outbound traffic to Level 3 where they again >>>>>dumped >>>>> my traffic into the Hudson. >>>>> >>>>> I called Level 3 customer service today and have the name of a >>>>>network >>>>> engineer to discuss options for fixing the BGP failure. >>>>> If you get any response with an engineering contact please let me >>>>>know. >>>>> >>>>> I shouldn't have to manually intervene to route around problems. Even >>>>> sadder is the response from Level 3 explaining that they spent hours >>>>>trying >>>>> to find the problem and had to manually reconfigure their network, >>>>>leading >>>>> to saturated links and more problems. Their network only healed when >>>>>the >>>>> faulty line card was replaced. >>>>> >>>>> I had reactivated the BGP session later that night, but after >>>>>reviewing >>>>> the actual damage that we incurred, and the widespread nature of the >>>>> failure, I have decided to leave our Level 3 BGP session down, at >>>>>least >>>>> until the engineering situation improves. >>>>> There may not be any good way to use a Level 3 BGP session without >>>>> risking the same "black hole" problem going forward. It's that type >>>>>of >>>>> failure that BGP is specifically designed to deal with, but it was >>>>> developed in the days of point-to-point circuits carrying IP traffic. >>>>> >>>>> Nowadays some networks have a new layer between the wires and IP, >>>>> namely MPLS, and this allowed BGP to stay up but deprived the >>>>>routers of >>>>> functioning IP next-hops, which they (both the Level 3 IP routers >>>>>and the >>>>> Level 3 personnel) were unaware of. Apparently the Level 3 IP-based >>>>>BGP >>>>> routers all believed they had working circuits edge-to-edge, but in >>>>>fact >>>>> their network was partitioned. >>>>> >>>>> MPLS must have some redundancy features, but they obviously weren't >>>>> working on Saturday. This is a huge engineering failure. No large >>>>>ISP could >>>>> function this way for long. >>>>> >>>>> I can wait the 72 hours for their response. I expect it will be full >>>>>of >>>>> mealy-mouth platitudes about how no system is foolproof and it will >>>>>all be >>>>> fine now. >>>>> >>>>> It would be more interesting to me to be in the meeting room where >>>>>some >>>>> engineer has to explain how they could lose so much traffic and not >>>>>be able >>>>> to operate a functioning, if degraded, network after a single line >>>>>card >>>>> failure. It wouldn't be the head of network design, because that >>>>>person >>>>> would already have been fired. >>>>> >>>>> Let me know if your hear anything. I will do the same. >>>>> >>>>> - Eric Jensen >>>>> AS17103 >>>>> 201-741-9509 >>>>> >>>>> >>>>> >>>>>