There's a Reddit user claiming he works at CL who says the reason were some faulty Infinera DTN-X instances.
https://www.reddit.com/r/centurylink/comments/aa2qa4/comment/ecovgab (dunno though why the user posted that to Reddit and not here) 30 Dec. 2018 г., 20:19 Saku Ytti <s...@ytti.fi>: > Hey John, > > Your criticism is warranted, but would also be addressed by > explanation DCN/OOB being the source of the problem. > > At any rate, I am looking forward to stop speculating and start > reading post-mortem written by someone who knows how networks work. > > On Sun, 30 Dec 2018 at 18:28, John Von Essen <j...@essenz.com> wrote: > > > > One thing that is troubling when reading that URL is that it appears > several steps of restoration required teams to go onsite for local login, > etc.,. Granted, to troubleshoot hardware you need to be physically present > to pop a line card in and out, but CTL/LVL3 should have full out-of-band > console and power control to all core devices, we shouldn't be waiting for > someone to drive to a location to get console or do power cycling. And I > would imagine the first step to alot of the troubleshooting was power > cycling and local console logs. > > > > > > -John > > > > > > > > On 12/30/18 10:42 AM, Mike Hammett wrote: > > > > It's technical enough so that laypeople immediately lose interest, yet > completely useless to anyone that works with this stuff. > > > > > > > > ----- > > Mike Hammett > > Intelligent Computing Solutions > > http://www.ics-il.com > > > > Midwest-IX > > http://www.midwest-ix.com > > > > ________________________________ > > From: "Saku Ytti" <s...@ytti.fi> > > To: "nanog list" <nanog@nanog.org> > > Sent: Sunday, December 30, 2018 7:42:49 AM > > Subject: CenturyLink RCA? > > > > Apologies for the URL, I do not know official source and I do not > > share the URLs sentiment. > > https://fuckingcenturylink.com/ > > > > Can someone translate this to IP engineer? What did actually happen? > > From my own history, I rarely recognise the problem I fixed from > > reading the public RCA. I hope CenturyLink will do better. > > > > Best guess so far that I've heard is > > > > a) CenturyLink runs global L2 DCN/OOB > > b) there was HW fault which caused L2 loop (perhaps HW dropped BPDU, > > I've had this failure mode) > > c) DCN had direct access to control-plane, and L2 congested > > control-plane resources causing it to deprovision waves > > > > Now of course this is entirely speculation, but intended to show what > > type of explanation is acceptable and can be used to fix things. > > Hopefully CenturyLink does come out with IP-engineering readable > > explanation, so that we may use it as leverage to support work in our > > own domains to remove such risks. > > > > a) do not run L2 DCN/OOB > > b) do not connect MGMT ETH (it is unprotected access to control-plane, > > it cannot be protected by CoPP/lo0 filter/LPTS ec) > > c) do add in your RFP scoring item for proper OOB port (Like Cisco CMP) > > d) do fail optical network up > > > > -- > > ++ytti > > > > > -- > ++ytti >