(I'm going to hate myself in the morning, but) On Fri, Oct 8, 2021 at 10:22 AM Masataka Ohta < mo...@necom830.hpcl.titech.ac.jp> wrote:
> William Herrin wrote: > > > https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/ > > our DNS servers disable those BGP advertisements if they > themselves can not speak to our data centers > > The end result was that our DNS servers became unreachable > even though they were still operational. > > means their DNS servers were serving the zone, even after > they recognize their zone data were too old, that is, expired. > > that's not what this means. I think Mr. Petach previously described this, but: 1) dns server in pop serves some content (ttls aren't important right now) 2) dns server uses some quagga/gated/bird/etc to announce locally: "Hey, foo/32 here!" (imagine this triggers an 'aggregate route' or 'network statement' (pick your vendor solution) to appear in the global table) 3) dns server also 'ping backend server set' 4) when 3 fails for X period of time 'tell quagga/bird/etc to stop announcing the /32' then the local pop no longer sources the aggregate (/24 or /23 or whatever)... so traffic SHOULD (externally) flow toward another copy of the /23 or /24 or whatever... there's not a lot of magic here... and it's not about the zone data really at all.