On Sat, Oct 9, 2021 at 11:16 AM Masataka Ohta < mo...@necom830.hpcl.titech.ac.jp> wrote:
> Bill Woodcock wrote: > > >> It may be that facebook uses all the four name server IP addresses > >> in each edge node. But, it effectively kills essential redundancy > >> of DNS to have two or more name servers (at separate locations) > >> and the natural consequence is, as you can see, mass disaster. > > > > Yep. I think we even had a NANOG talk on exactly that specific topic a > long time ago. > > > > > https://www.pch.net/resources/Papers/dns-service-architecture/dns-service-architecture-v10.pdf > > Yes, having separate sets of anycast addresses by two or more pops > should be fine. > > To be fair, it looks like FB has 4 /32's (and 4 /128's) for their DNS authoritatives. All from different /24's or /48's, so they should have decent routing diversity. They could choose to announce half/half from alternate pops, or other games such as this. I don't know that that would have solved any of the problems last week nor any problems in the future. I think Bill's slide 30 is pretty much what FB has/had deployed: 1) I would think the a/b cloud is really 'as similar a set of paths from like deployments as possible 2) redundant pairs of servers in the same transit/network 3) hidden masters (almost certainly these are in the depths of the FB datacenter network) (though also this part isn't important for the conversation) 4) control/sync traffic on a different topology than the customer serving one > However, if CDN provider has their own transit backbone, which is, > seemingly, not assumed by your slides, and retail ISPs are tightly > I think it is, actually, in slide 30 ? "We need a network topology to carry control and synchronization traffic between the nodes" connected to only one pop of the CDN provider, the CDN provider > it's also not clear that FB is connecting their CDN to single points in any provider... I'd guess there are some cases of that, but for larger networks I would imagine there are multiple CDN deployments per network. I can't imagine that it's safe to deploy 1 CDN node for all of 7018 or 3320... for instance. > may be motivated to let users access only one pop killing essential > redundancy of DNS, which should be overengineering, which is my > concern of the paragraph quoted by you. > > it seems that the problem FB ran into was really that there wasn't either: "secondary path to communicate: "You are the last one standing, do not die" (to an edge node) or: "maintain a very long/less-preferred path to a core location(s) to maintain service in case the CDN disappears" There are almost certainly more complexities which FB is not discussion in their design/deployment which affected their services last week, but it doesn't look like they were very far off on their deployment, if they need to maintain back-end connectivity to serve customers from the CDN locales. -chris