MontyRee wrote: > sorry for non-txt based previous e-mail. sending again. > > > > So thanks for kind and concrete answers. > > and addtional questions are... > > > -. others can use other resolvers like windows based or other bind version. > so this program works well as you said without exception? > > > -. in the point of high-availability of service, > what it better two authorative dns servers or two master dns servers using > L4 switch? > > > > So thanks again. > > > Regards. > > > > >> Subject: RE: What would be happen if one of two dns was down? >> From: [EMAIL PROTECTED] >> To: bind-users@isc.org >> Date: Tue, 12 Aug 2008 10:44:02 -0500 >> >> On Tue, 2008-08-12 at 06:42 +0000, MontyRee wrote: >> >>> So thanks for kind answer. >>> >>> >>> Additional questions below. >>> >>> >>> >>>>> Hello, all. >>>>> >>>>> >>>>> I have operated two dns(primary and secondary) for one domain like below. >>>>> >>>>> >>>>> example.com IN NS ns1.example.com >>>>> example.com IN NS ns2.example.com >>>>> >>>>> >>>>> and there was a event that ns1.example.com dns was down. >>>>> As I know, if ns1 dns is down, all requests go to the ns2.example.com. >>>>> >>>> Depending on what 'down' means, it could take some time before >>>> the request is sent to ns2. So there will likely be a delay, even >>>> if not much (it will feel like forever to some users). >>>> >>> my 'down' means that system down so can't ping to server. >>> >>> >>> >>>>> But when ns1.example.com dns was down, actually some people can't lookup >>>>> the domain. >>>>> >>>> Sounds like a configuration issue. However realize there is a zone >>>> cache and if ns2 is slaving zones of ns2 (typical bind master slave >>>> scenario) and the zone cache expires, then ns2 will refuse to >>>> trust the slaved zone it had... and thus nothing works. >>>> >>> Sorry, I can't understand what you said. >>> actually the master dns server(system) down time was just a hour and slave >>> dns >>> works well without any problem, but at that time some can connect but some >>> said that >>> they can't resolve the domain at all. >>> >> The slave will answer queries for the zone until the zone TTL expires >> in which case if cannot contact the master, the zone will go effectively >> dead. >> >> I think I used some bad "terms" in my explanation. Basically >> there is an expiration ttl for which a slave will consider its >> data to be good. After that, it will need to hit the master. >> >> (I trip up on using the right words) >> >> The value is often set to 2 weeks or more. But if the master is >> down for a LONG time... you'll lose it all eventually (the slave >> won't answer for that zone anymore). >> >> If you're seeing this problem after a short period of time, that's >> likely NOT the cause unless somebody set the expiry in the SOA >> to something really small. >> >> Caching in DNS is a wonderful thing, but can cause scenarios where >> resolution is working for one and not for another (because of >> the various Time To Live values and the time of last query/cache). >> >> Can you give us a feel for the amount of time between the failure >> and the problem? Is it almost immediate? If so, then it's some >> other kind of configuration issue (unless, as I said the zone was >> just totally miconfigured). Can you post the SOA for the zone? >> >> >>> It means, dns failover doesn't works well? >>> and some resolver or some bind version, insist querying for the downed dns >>> server? >>> >> Usually the client resolver is looking to query multiple nameservers, if >> the first one is down, it moves onto the next and so on. Failover works >> fine in this style (normally). Of course, a client might NOT be aware >> of more than one nameserver... in which case there is no failover (duh). >> >> >> ... >> >>> So thanks for your help again.. >>> >> Did I explain it better this time? >> >> >> Let me try to explain this from a high level:
1) The NS records that are published for a zone are for the consumption of other nameservers (technically, "iterative resolvers"). If one of the nameservers listed as an NS for a zone becomes unavailable, failover is very quick to the other NS(es). So quick as to usually be unnoticeable by ordinary users. Iterative resolvers also *remember* which nameservers are down, or slow, so they are very adaptive to failures. 2) The nameservers that are defined for a "stub resolver", like your typical end-user PC, are tried *in*sequence*, so if the first one is down, there may be a delay before the second one is tried, and if that one is down an ever longer delay before the third one is tried, and so forth. The delay is often quite noticeable, and impatient applications may actually time out before a working nameserver is found. Stub resolvers typically don't *remember* that a particular nameserver is down, either, so in case of a failure, all queries are likely to be slow until the failure is corrected. 3) Between masters and slaves, there is a REFRESH interval defined for each replicated zone, which governs how often the slave checks the master for updates, and then an EXPIRE interval after which the slave considers the zone "bad" and will no longer give useful answers for names in the zone. As mentioned previously in the thread, while REFRESH can be as low as an hour or more, EXPIRE is typically on the order of weeks, if not months. If a slave can't talk to the master for weeks, chances are it's a permanent condition and the right thing to do is "expire" the zone so that clients aren't given stale information. In enterprises with a large number of slave servers (like ours), for redundancy it is common to have multiple tiers of slaves, and the slaves at a given tier to list multiple "masters" (i.e. sources of zone data) from higher tiers, so that even if a single intermediate "master" dies or becomes unavailable, changes still propagate out to the edges everywhere. Note that there is an inherent problem in having servers at the *same* tier list each other as "masters" reciprocally or in a circular fashion, because then slaved zones can become "immortal" (i.e. even if they're deleted from the primary master, the slaves in that particular tier keep refreshing it from each other indefinitely). So, your questions are a) "others can use other resolvers like windows based or other bind version." Depends on what you mean by "resolver". If you mean the "resolver" part of a nameserver implementation like BIND, configured for iterative resolution (i.e. based on published NS records), then the failover is very fast. If, on the other hand, you mean a "stub resolver", like a typical end-user PC client, then the failure of the first nameserver in the resolver list can cause noticeable delays for every query. Note that on some platforms it's possible to tune the delays (e.g. libresolv on some Unix/Linux platforms understands some /etc/resolv.conf options which govern timeouts and retries). In the case of a "forwarding resolver", such as, e.g. BIND configured with a "forwarders" statement, it depends on the exact implementation. Even in its forwarding mode, BIND, for instance, still maintains a cache, so on that basis alone it can be expected to perform reasonably well even in the case of failures (unless the TTLs of the records being looked up are very low). Modern versions of BIND also keep track of up/down/slowness of its upstream forwarders, so it can adapt to failures in the same way that it does when resolving iteratively (older versions of BIND are not as adaptive in forwarding mode, trying each forwarder in sequence, so they degenerated to the performance level of stub resolvers + caching). Other packages/implementations of forwarding resolvers may cope well with failures, or not so well. It really depends. b) "in the point of high-availability of service, what it better two authorative dns servers or two master dns servers using L4 switch?" I'm not 100% sure what you mean by "L4 switch". Do you mean a load-balancer? The Internet standards mandate at least 2 nameservers for each zone, so you don't technically have the option of putting 2 DNS servers behind a single, load-balanced VIP. We have 2 VIPs defined for our Internet-facing DNS zones and then each VIP has multiple nameservers behind it. This conforms to standards, and not only gives us an acceptable level of availability in the face of unplanned outages, but also the flexibility to perform maintenance, upgrades, etc. transparently to Internet DNS clients. There's also the "anycast" approach, which is routing-layer-based, but since we don't use that here, and I haven't researched it at all, and in any case don't have a strong background in network routing, I'll defer to others to explain how that works. What, by the way, do you mean by two "master" DNS servers? The term "master" is usually used in DNS in two different ways: 1) relationally, when talking about replication (as I do above), the master is the provider of the zone data, and the slave is the consumer. Within a multi-level replication hierarchy, a given server might be "master" with respect to other servers in the hierarchy, and "slave" to others. 2) When viewing the hierarchy as a whole, in the classic DNS replication model (i.e. based on point-to-point AXFR/IXFR transfers), there is really only 1 "master", i.e. the origin of the zone data, whether that be from a flat file, a database backend, or whatever. All other nameservers in the hierarchy are "slaves", in that they obtain the zone data from other nameservers, rather than a source external to DNS itself. Sometimes the term "primary master" is used for this kind of "master", to distinguish it from "master", as used in the relational sense in #1 above. In neither sense of the term "master" do I understand how one could have multiple "masters" behind a load-balancer, unless you're i) talking about putting load-balancers between servers in the replication hierarchy (in which case they're all "authoritative" anyway and there's no difference between the options you presented), ii) deviating from the classic DNS replication model (e.g. Microsoft's "multi-master" architecture for Active Directory integrated DNS, where the backend is a replicated LDAP database), or iii) simply using the term incorrectly. - Kevin