Maybe we are idealizing these so-called tier-1 carriers and we, tier-ns, should treat them as what they really are: another AS. Accept that they are going to fail and do our best to mitigate the impact on our own networks, i.e. more peering.
On Mon, Aug 31, 2020 at 9:54 AM Martijn Schmidt via NANOG <nanog@nanog.org> wrote: > At this point you don't even know whether it's a human error (example: > generating a flowspec rule for port TCP/179), a filtering issue (example: > accepting a flowspec rule for port TCP/179), or a software issue (example: > certain flowspec update crashes the BGP daemon). And in the third scenario > I think that at least some portion of the blame shifts from the carrier to > its vendors, assuming the thing that crashed was not a home-grown BGP > implementation. > > With the route optimizer incidents - because let's face it, Honest > Networker is on the money as usual > https://honestnetworker.net/2020/08/06/as10990-routing/ - there is really > no excuse for any tier-1 carrier, they should at the very least have strict > prefix-list based filtering in place for customer-facing EBGP sessions. In > those cases it's much easier to state who's not taking care of their > proverbial lawn. > > Best regards, > Martijn > > On 8/31/20 3:25 PM, Tom Beecher wrote: > > https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/ > > > I definitely found Mr. Prince's writing about yesterday's events > fascinating. > > Verizon makes a mistake with BGP filters that allows a secondary mistake > from leaked "optimizer" routes to propagate, and Mr. Prince takes every > opportunity to lob large chunks of granite about how terrible they are. > > L3 allows an erroneous flowspec announcement to cause massive global > connectivity issues, and Mr. Prince shrugs and says "Incidents happen." > > > > > > On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <h...@interall.co.il> > wrote: > >> On 30/08/2020 20:08, Baldur Norddahl wrote: >> >> https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/ >> >> Sounds like Flowspec possibly blocking tcp/179 might be the cause. >> >> But that is Cloudflare speculation. >> >> Regards, >> Hank >> Caveat: The views expressed above are solely my own and do not express >> the views or opinions of my employer >> >> An outage is what it is. I am not worried about outages. We have multiple >> transits to deal with that. >> >> It is the keep announcing prefixes after withdrawal from peers and >> customers that is the huge problem here. That is killing all the effort and >> money I put into having redundancy. It is sabotage of my network after I >> cut the ties. I do not want to be a customer at an outlet who has a system >> that will do that. Luckily we do not currently have a contract and now they >> will have to convince me it is safe for me to make a contract with them. If >> that is impossible I guess I won't be getting a contract with them. >> >> But I disagree in that it would be impossible. They need to make a good >> report telling exactly what went wrong and how they changed the design, so >> something like this can not happen again. The basic design of BGP is such >> that this should not happen easily if at all. They did something unwise. >> Did they make a route reflector based on a database or something? >> >> Regards, >> >> Baldur >> >> On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikeboli...@gmail.com> >> wrote: >> >>> Exactly. And asking that they somehow prove this won't happen again is >>> impossible. >>> >>> - Mike Bolitho >>> >>> On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.wea...@thenap.com> >>> wrote: >>> >>>> I’m not defending them but I am sure it isn’t intentional. >>>> >>>> >>>> >>>> *From:* NANOG <nanog-bounces+drew.weaver=thenap....@nanog.org> *On >>>> Behalf Of *Baldur Norddahl >>>> *Sent:* Sunday, August 30, 2020 9:28 AM >>>> *To:* nanog@nanog.org >>>> *Subject:* Re: Centurylink having a bad morning? >>>> >>>> >>>> >>>> How is that acceptable behaviour? I shall remember never to make a >>>> contract with these guys until they can prove that they won't advertise my >>>> prefixes after I pull them. Under any circumstances. >>>> >>>> >>>> >>>> søn. 30. aug. 2020 15.14 skrev Joseph Jenkins < >>>> j...@breathe-underwater.com>: >>>> >>>> Finally got through on their support line and spoke to level1. The only >>>> thing the tech could say was it was an issue with BGP route reflectors and >>>> it started about 3am(pacific). They were still trying to isolate the issue. >>>> I've tried failing over my circuits and no go, the traffic just dies as L3 >>>> won't stop advertising my routes. >>>> >>>> >>>> >>>> On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> >>>> wrote: >>>> >>>> Hello, >>>> >>>> >>>> >>>> Woke up this morning to a bunch of reports of issues with connectivity >>>> had to shut down some Level3/CTL connections to get it to return to normal. >>>> >>>> >>>> >>>> As of right now their support portal won’t load: >>>> https://www.centurylink.com/business/login/ >>>> >>>> >>>> >>>> Just wondering what others are seeing. >>>> >>>> >>>> >>>> >> >