Hi Latitude!

Short answer: I think 2s delay is not possible in a distributed system with many global distributed slaves and limited ressources.

Long answer: It all depends on how much money you have and time in setting up such a service - long comments inline.

Am 07.03.2018 um 07:10 schrieb Latitude:
I would like to solicit constructive feedback in regards to a distributed DNS
zone hosting proof of concept I'd like to design and establish.

I must deploy a DNS system with the following requirements:
- single master server, multiple slave servers
- minimal time for name resolving for Americas, Europe and Asia
- up to millions records in a domain zone
- changes propagate in real time (master -> slaves), 2 sec max delay
- automatic slave data re-syncing on master link restore after disconnect
- API for zone records manipulation (insert, update, delete)

There is one important thing you did not mention: how often do you update the zone? once a day? once an hour? once a minute? a few times per second?


So far I am considering using (free) DC/OS on Amazon Web Services with the
latest version of BIND containerized using docker on a Linux or Unix OS. Dyn
and Infoblox are also on my list of items to research but I have never used
either and I enjoy working with BIND on Linux. After all this is the BIND
Users group, but I would be interested to know if someone can make a case
for using Dyn or Infoblox in this case.

The challenges are somewhere else. First, design the distribution, then think about OS, software, cloud provider, ....

Considerations/questions I have about this deployment for this Bind-Users
forum are:

1. How can I examine DNS resolution times using this platform (or other
platforms to compare with) in different geographic areas of the world
without first deploying it? I will need to have benchmark data to test
against to verify I am getting the fastest speeds possible on name
resolutions.

You can not measure something you have not built yet. But what you can do is, measuring what somebody else built, and then clone there setup are build it similar, or buy there service. There are plenty of existing DNS providers. You can for example us RIPE Atlas to test them. There is also dnsperf.com and their entrprise service perfops.net which would give you a rough idea what DNS resolution times are possible.

You did not mention what your use case is. Who will do the DNS queries? a) A web browser on some standard PC or b) a dedicated application

For a) you do not have any control about the resolver used by the end user. HEnce, it may use its provider resolver, 8.8.8.8 (or similar) or its own. This resolver may be good in choosing the best announced nameserver (from the zone's NS records) or not. In this case I would suggest using Anycast - but be warned. Anycast for performance works only if you carefully choose your location, where 'location' means the location in the network, e.g. which transit providers, which exchanges, AS path length, .... If you do the hosting all by yourself and carefully choose transit providers you can have great performance (ie Cloudflare), but it is very expensive. Choosing bad will give you bad performance.

For b) I would definitely avoid Anycast. Provision the application with the namne server known in the region, or make the application smart an let it probe which name server answers fastest.

2. How to handle millions of records in a DNS zone, and how common is it to
have millions of records in a DNS zone?

It is probably not that common, but there are several TLDs which have millions of records. One fo our customers zone has 25Mio RR and we do not see any problems with Bind (or NSD or Knot).

3. What API solutions for DNS zone edits currently exist or should I be
lookin into?

It all depends on your setup. If you go with Bind I would suggest DNS UPDATE. If you choose some other replication technique there are other tools (see below).

I will research more in the next day but so far I know I can manually
configure named.conf to propagate zone changes to slave servers rapidly
(aiming for 2 seconds or less) using NOTIFY messages and zone transfers, and
also configure slave servers to automatically re-synch zone data with the
master server upon reestablishing a connection. That should satisfy two of
my requirements above.

In fact it is not only NOTIFY+XFR. It is:

- Applying the zone change on the master, ie. DNS UPDATE

- NOTIFY all the slaves. With lots of slaves this may take some time and AFAIK Bind may throttle NOTIFYs

- NOTIFYs are UDP, and may get lost on the way, so you have retransmission. And if all NOTIFYs are lost, the slave will stay out of sync until there is the next zone update or SOA refresh value expires. This can be worked around ie by pushing your slaves to query the master every second for the current SOA (or even more optimzed: request an IXFR every second. for sure this will generate load on the master, but you can not have small delays without workload)

- IXFR the zone. There is no guarantee that the slave will start the zone transfer immediately. Further, if all slaves request the XFR at the same time, Bind may throttle the transfer (you can tweak these settings in Bind). Also, a single lost packet may cause the IXFR to take much longer than expected.

Also, if you have frequent updates, NOITFYs may overlap the current XFR. Hence, only after the XFR is finished, the NOTIFYs will be processed and a new XFR will be started. I once also saw a name server which stopped the XFR when receiving a NOTIFY and started a new XFR, which would case problems with plenty of NOTIFYs.

- Apply the IXFR. Having such a big zone, this may take considerable time. I think with Bind this is rather fast, but other name servers like NSD/Knot may take some time.

- TTLs. Every RR has a TTL. Forcing 2s sync-time is needless when having ie a TTL of 5 seconds. Hence, you would have to use a TTL (and eventually also a negative TTL) of a few second or less. But, this goes back to the previous question if you control the resolver. If not (case a) then having low TTLs often does not work as expected as service providers may not accept such low TTLs and use a higher TTL.


Hence, as you see, 2 seconds to sync the slave is quite hard to achieve, and sometime needless if you do not control the application and the resolver.

Any additional advice, hints, or tips for my proof of concept would be
greatly appreciated! Thanks in advance. This will be a very fun project to
design and hopefully implement.

Maybe DNS is the wrong protocol to distribute data. There are usage scenarios where it is better to use different setups. One example: Store the zone in a data base, use DB-replication, and then use a name server with database backend. Database replication will usually ensure that zone are in sync and recover from disconnects immediately. In such setups you usually also do caching in the name server software (as DB-querys are the bottleneck) which may also affect the sync time you experience as end user.

Basically the setup will be a compromise of low resultion time vs. sync-time. Ie. if you only have 1 slave next to the master, then it is easy to achieve 2 seconds sync time, but DNS resolution may be 200ms or more for far away end users. Having hundreds of slave around the world (like Cloudflare) gives you perfect resolution time, but fast syncing is complex.

Hence, my suggestion is to consider using existing DNS providers. Ask them what they can guarantee for syncing the slaves and if the do caching on their slaves.

I work for RcodeZero Anycast DNS (compare us on dnsperf.com) and I think we do a great job, but we guarantee only a few minutes with 40 slaves world-wide. IMO trying to achieve 2 seconds puts heavy load on the system and isn't worth the effort if you use way higher TTLs or don't control the resolvers.

regards
Klaus


_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to