On Wed, 6 Oct 2021, Michael Thomas wrote:


On 10/6/21 3:33 PM, Jon Lewis wrote:
 On Wed, 6 Oct 2021, Michael Thomas wrote:

  People have been anycasting DNS server IPs for years (decades?). So,
 no.

 But it wasn't just their DNS subnets that were pulled, I thought. I'm
 obviously really confused. Anycast to a DNS server makes sense that
 they'd pull out if they couldn't contact the backend. But I thought that
 almost all of their routes to the backend were pulled? That is, the DFZ
 was emptied of FB routes.

 Well, as someone else said, DNS wasn't the problem...it was just one of
 the more noticeable casualties.  Whatever they did broke the network
 rather completely, and that took out all of their DNS, which broke lots of
 other things that depend on DNS.

Maybe the problem here is that two things happened and the article conflated the two: the DNS infrastructure pulled its routes from the anycast address and something else pulled all of the other routes but wasn't mentioned in the article.

From the engineering.fb.com article:

"This was the source of yesterday’s outage. During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network, effectively disconnecting Facebook data centers globally."

If you kill the backbone, and every site determines "my connectivity is hosed, suppress anycast propagation.", then you simultaneously have no network, and no anycast (which might otherwise propagate to transit/peers at each or at least some subset of your sites). All of your internal data and communication systems that rely on both network and working DNS suddenly don't work, so internal communications likely degraded to engineers calling or texting each other.

From one of the earlier articles, it sounds like they don't have true out
of band access to their routers/switches, which makes it kind of hard to fix the network, if it's no longer a network and you have no access to console or management ports.

----------------------------------------------------------------------
 Jon Lewis, MCP :)           |  I route
 StackPath, Sr. Neteng       |  therefore you are
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________

Reply via email to