anycasting, DNS client retry/failover

Gordon A. Lang Fri, 06 Mar 2009 13:06:41 -0800

I have just implemented DNS anycasting on our inside network using Ciscocontent switches to monitor the health of the servers and to advertise anOSPF route when the back-end services are alive. I have three CSS'ssimultaneously advertising the same service address to the network, andclients get routed to the nearest one. It works great.


Anyone else try this?

When I was testing, I sent 2000 queries per second from two sourcessimultaneously on diverse parts of the network, and proceeded to startdisconnecting and reconnecting cables on the content switches to see howwell it all worked. No matter what I did, I could not seem to lose morethan 10 packets per link-state-change (which is very good in my mind). Butwhen I stopped the services on the actual servers, it took up to 5 secondsbefore the content switch registered the fault (because the keepalives arecurrently configured for every 5 seconds), and I lost thousands of queriesin those few seconds.

I am considering reducing the keepalive period to improve this faultresponse, but I'd like to get a better understanding of the DNS clientbehavior when it's queries go unanswered.

From what I recall, the typical DNS client will send a single query packet

to its first-configured dns resolver and wait 1 second for a response. Ifno response comes, the DNS client sends a second query to the same dnsresolver and waits either 1 second or 2 seconds, depending on if the clientis progressive or not, for a response. If still no response comes, most DNSclients will ask the same dns resolver one last time, and wait either 1 moresecond or 4 seconds, depending on the client. And perhaps somenon-progressive DNS clients try a fourth time. If still no response comes,then the DNS client starts from the beginning with the second-configured DNSresolver.

If this is true, then I would think a keepalive period of 3 seconds ought todivert queries away from dead servers fast enough to satisfy the vastmajority of DNS client requests before failing over to the second-configureddns resolver.


Any comments?

And despite what I have read about DNS clients over the years, what I haveexperienced in real life has left me uncertain about what really happens.Typically, prior to this anycast deployment, when our first-configured dnsresolver went down, users complained about waiting 60 to 90 seconds beforetheir web pages would come up. That does not make sense to me because Ithought the second-configured resolver would be used within a few seconds.


Can any suggest why real life doesn't reflect what is written?

Thanks.

--

Gordon A. Lang


_______________________________________________
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

anycasting, DNS client retry/failover

Reply via email to