Hey John, Your criticism is warranted, but would also be addressed by explanation DCN/OOB being the source of the problem.
At any rate, I am looking forward to stop speculating and start reading post-mortem written by someone who knows how networks work. On Sun, 30 Dec 2018 at 18:28, John Von Essen <j...@essenz.com> wrote: > > One thing that is troubling when reading that URL is that it appears several > steps of restoration required teams to go onsite for local login, etc.,. > Granted, to troubleshoot hardware you need to be physically present to pop a > line card in and out, but CTL/LVL3 should have full out-of-band console and > power control to all core devices, we shouldn't be waiting for someone to > drive to a location to get console or do power cycling. And I would imagine > the first step to alot of the troubleshooting was power cycling and local > console logs. > > > -John > > > > On 12/30/18 10:42 AM, Mike Hammett wrote: > > It's technical enough so that laypeople immediately lose interest, yet > completely useless to anyone that works with this stuff. > > > > ----- > Mike Hammett > Intelligent Computing Solutions > http://www.ics-il.com > > Midwest-IX > http://www.midwest-ix.com > > ________________________________ > From: "Saku Ytti" <s...@ytti.fi> > To: "nanog list" <nanog@nanog.org> > Sent: Sunday, December 30, 2018 7:42:49 AM > Subject: CenturyLink RCA? > > Apologies for the URL, I do not know official source and I do not > share the URLs sentiment. > https://fuckingcenturylink.com/ > > Can someone translate this to IP engineer? What did actually happen? > From my own history, I rarely recognise the problem I fixed from > reading the public RCA. I hope CenturyLink will do better. > > Best guess so far that I've heard is > > a) CenturyLink runs global L2 DCN/OOB > b) there was HW fault which caused L2 loop (perhaps HW dropped BPDU, > I've had this failure mode) > c) DCN had direct access to control-plane, and L2 congested > control-plane resources causing it to deprovision waves > > Now of course this is entirely speculation, but intended to show what > type of explanation is acceptable and can be used to fix things. > Hopefully CenturyLink does come out with IP-engineering readable > explanation, so that we may use it as leverage to support work in our > own domains to remove such risks. > > a) do not run L2 DCN/OOB > b) do not connect MGMT ETH (it is unprotected access to control-plane, > it cannot be protected by CoPP/lo0 filter/LPTS ec) > c) do add in your RFP scoring item for proper OOB port (Like Cisco CMP) > d) do fail optical network up > > -- > ++ytti > -- ++ytti