On Mon, Feb 10, 2014 at 11:52:11PM +0100, Anand Buddhdev wrote: > The zone's operator had accidentally set its serial in the future, and > then set it back, not realising that they should have performed a serial > roll-over.
this is the core of the problem. There might be more than one appropriate response to a protocol violation. > Regardless of the recovery method, I'm more interested in opinion about > zone expiry. All the servers were able to query the master for the SOA > record, as well as transfer from it. However, after seeing an older > serial for an extended period, both BIND and NSD expired the zone, > presumably because they couldn't synchronise the zone with the master. > Knot seems to think that it's okay to serve the zone as long as it can > query the master, even if the master's serial number is different. > > Is Knot's behaviour acceptable? I see no reason to single out a particular implementation. In fact, the more diversity we see, the more interesting parts of the DNS specification we get into. Zone expiry hasn't been fully specified in 1034/1035 (remember the SERVFAIL vs REFUSED discussion). 1034 says To detect changes, secondaries just check the SERIAL field of the SOA for the zone. In addition to whatever other changes are made, the SERIAL field in the SOA of the zone is always advanced whenever any change is made to the zone. The advancing can be a simple increment, or could be based on the write date and time of the master file, etc. The purpose is to make it possible to determine which of two copies of a zone is more recent by comparing serial numbers. Serial number advances and comparisons use sequence space arithmetic, so there is a theoretic limit on how fast a zone can be updated, basically that old copies must die out before the serial number covers half of its 32 bit range. In practice, the only concern is that the compare operation deals properly with comparisons around the boundary between the most positive and most negative 32 bit numbers. This was later refined with RFC 1982. The periodic polling of the secondary servers is controlled by parameters in the SOA RR for the zone, which set the minimum acceptable polling intervals. The parameters are called REFRESH, RETRY, and EXPIRE. Whenever a new zone is loaded in a secondary, the secondary waits REFRESH seconds before checking with the primary for a new serial. If this check cannot be completed, new checks are started every RETRY seconds. The check is a simple query to the primary for the SOA RR of the zone. If the serial field in the secondary's zone copy is equal to the serial returned by the primary, then no changes have occurred, and Note it says "is equal to" and not "is equal to or lower than", even though the text in the previous paragraph suggests that one half of the 2^32 space is "older" and also is explicit about the purpose: "... make it possible to determine which of two copies of a zone is more recent". the REFRESH interval wait is restarted. If the secondary finds it impossible to perform a serial check for the EXPIRE interval, it must assume that its copy of the zone is obsolete an discard it. It can be argued that it's _not_ impossible to perform the check, just that the check found no increase (and no equality, either). There's an edge case for serial + 2^31, though. 1035, for completeness, not contributing much, reads: SERIAL The unsigned 32 bit version number of the original copy of the zone. Zone transfers preserve this value. This value wraps and should be compared using sequence space arithmetic. 2181 adds: Secondary servers use the serial number in the SOA record of the zone to determine when it is necessary to update their local copy of the zone. Serial numbers are basically just 32 bit unsigned integers that wrap around from the biggest possible value to zero again. See [RFC1982] for a more rigorous definition of the serial number. [...] Occasionally due to editing errors, or other factors, it may be necessary to cause a serial number to become smaller. Never simply decrease the serial number. Secondary servers will ignore that change, and further, will ignore any later increments until the earlier large value is exceeded. While this is descriptive rather than normative text, it can be argued that this behaviour was expected and intended. In fact, if "change" would have been the desired indicator, the whole 'sequence space arithmetic' would have been useless. Also, it would have led to swing state under certain circumstances. RFC 2136, 3.4.2.2., is the only text that explicitly mentions the "lower" relation, but doesn't help here. So, it can be argued that expiring the zone for an SOA serial to be higher than at the respective master, is already a step too far. To that extent, Knot's behaviour is protocol conformant and also in line with behaviour warned about as early as RFC 1034 and RFC 2181. Doesn't play nice with DNSSEC, though. > In my opinion, BIND has done the pragmatic thing here and recovered by I agree with that, too. However, it only worked because this server continued the SOA checks after the expire (and remember, this proves that the checks aren't "impossible", so there was no reason to expire in the first place). NSD, as per your observation, not only discards the contents of the zone but apparently does not continue the SOA checks. Makes sense in those situations where the master has gone. Especially when you have a server with a large number of zones sourced from the same unreachable master, the housekeeping overhead can be non-negligible. Therefore, the summary response is "it depends". If I had a wish, I'd ask that Knot not simply be adjusted to "what BIND does", because we seem to have a difference in interpretation of the spec and there is a need to fix that. The ever so often abused 'robustness principle' isn't enough to rule: it isn't the server's fault to start with. Which means, this is work for someplace IETF. -Peter _______________________________________________ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs