On Wed, 6 Oct 2021, Michael Thomas wrote:
On 10/6/21 3:33 PM, Jon Lewis wrote:
On Wed, 6 Oct 2021, Michael Thomas wrote:
People have been anycasting DNS server IPs for years (decades?). So,
no.
But it wasn't just their DNS subnets that were pulled, I thought. I'm
obviously really confused. Anycast to a DNS server makes sense that
they'd pull out if they couldn't contact the backend. But I thought that
almost all of their routes to the backend were pulled? That is, the DFZ
was emptied of FB routes.
Well, as someone else said, DNS wasn't the problem...it was just one of
the more noticeable casualties. Whatever they did broke the network
rather completely, and that took out all of their DNS, which broke lots of
other things that depend on DNS.
Maybe the problem here is that two things happened and the article conflated
the two: the DNS infrastructure pulled its routes from the anycast address
and something else pulled all of the other routes but wasn't mentioned in the
article.
From the engineering.fb.com article:
"This was the source of yesterday’s outage. During one of these routine
maintenance jobs, a command was issued with the intention to assess the
availability of global backbone capacity, which unintentionally took down
all the connections in our backbone network, effectively disconnecting
Facebook data centers globally."
If you kill the backbone, and every site determines "my connectivity is
hosed, suppress anycast propagation.", then you simultaneously have no
network, and no anycast (which might otherwise propagate to transit/peers
at each or at least some subset of your sites). All of your internal data
and communication systems that rely on both network and working DNS
suddenly don't work, so internal communications likely degraded to
engineers calling or texting each other.
From one of the earlier articles, it sounds like they don't have true out
of band access to their routers/switches, which makes it kind of hard to
fix the network, if it's no longer a network and you have no access to
console or management ports.
----------------------------------------------------------------------
Jon Lewis, MCP :) | I route
StackPath, Sr. Neteng | therefore you are
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________