On Wed, Oct 20, 2021 at 11:24:47AM -0700, Wes Hardaker wrote: > But, as Viktor indicated in his posts, we could move even lower (100 > being the next obvious step, but even lower is possible to still retain > a reasonable percentage). But there is of course a risk of we'll never > get to a definitive value, and may operators by constantly lowering it > and they have to keep changing values. > > So, the question: what's the right FINAL value to put in the draft > before LC?
Some observations to help the decision process: 1. A validating resolver that prefers to SERVFAIL on all responses with excessive iterations, avoiding downgrades to "insecure", can simply ignore such NSEC3 records, and if no appropriate NSEC or NSEC3 records remain can then treat the reply as bogus. 2. The downside of insecure downgrade is that any affected zones are subject to forgery of all names strictly below the zone apex, via fake insecure delegations (the denial of existence of DS records will be accepted), and also NODATA forgery for all qtypes at the zone apex (except NSEC3 and RRSIG). 3. The downside of SERVFAIL for excess iterations is that if the target zone handles names of SMTP hosts without DANE TLSA records, then TLSA denial of existence failure with render these mail servers unavailable to DANE-enabled SMTP clients. Also any wildcard replies that are based of non-existence proofs of the qname, ... will be bogus, thus e.g. wildcard A/AAAA answers are likely to SERVFAIL. 4. The cost of P256 signature verification is (on a now somewhat dated Xeon Skylake system) ~300 times that of a SHA1 hash. Thus south of 150 iterations, further reductions in the iteration count offer only a modest benefit to validating resolvers that are also validating the signature. 5. However, no signature verification applies on the authoritative server (perhaps a secondary that did not specifically "volunteer" to serve zones with a high iteration count). Also when doing aggressive negative caching via previously received NSEC3 records, once again only SHA1 hashing is involve, the signature verification happened when the records were cached. Therefore, while a softfail to insecure makes it possible to avoid immediate pain, the SERVFAIL alternative is simpler to implement correctly, but may require setting the bar somewhat higher. Ideally, all zone operators would get the message, apply a realistic threat model, and set the iteration counts to 0. Much progress has been made in a comparatively short time, but pockets of "resistance" remain, with a large majority of domains in the [1-20] range, and low but perhaps non-negligible zone counts (out of 12.46M zones) for: 50 iterations: ~13k 100 iterations: ~20k (7.9k netcup.de, 2.2k nlhosting.net, 2.1k core-networks.de) 150 iterations: ~6k (5.8k mijnhostingpartner.nl) 500 iterations: 101 (85 raytheon.com) With a bit more nagging we could probably convince the small number of operators that dominate the counts in question to make adjustments. Otherwise, we can declare victory at either 100 or 150, and recommend SERVFAIL above 500, but MAY SERVFAIL at the lower cutoff. I'd like to see more responses with specific numbers, and thoughts on whether a range in which downgrade to insecure happens is a good or bad idea. That is, is it always either AD=1 or SERVFAIL, or is there merit in AD=0 for a range of values above a soft cutoff before a higher hard cutoff is reached. At this point, my inclination is to hardfail at 150 and avoid softfail. Raytheon may be briefly incovenienced, but otherwise this feels the most robust option, unless rough consensus is to try to set the bar lower and softfail from there to some suitable upper bound in the 150 to 500 range. On Thu, Oct 21, 2021 at 01:22:25PM +0200, Peter van Dijk wrote: > I don't know what the -right- value is, but I know what I want: 0 > iterations, empty salt, otherwise the NSEC3 gets ignored, presumably > leading to SERVFAIL. This removes the 'insecure' window completely. > > So, I'll support any push to lower the numbers. Please be specific, even if you feel you'll land in the rough. > Editorial nit, already hinted at above: the text currently has > "Validating resolvers MAY return SERVFAIL when processing NSEC3 > records with iterations larger than 500." - I suggest changing this to > "validating resolvers MAY ignore NSEC3 records with iterations larger > than 500". That way, zones in the middle of a transition from 1000 to > 0 iterations do not get punished. Zones at 1000, not in a transition, > will still get SERVFAIL by virtue of the NSEC3 proof missing (because > it is ignored). Thanks, I think I agree. Ignore records with counts that would SERVFAIL if best-available, sounds sensible. On Thu, Oct 21, 2021 at 02:52:47PM +0200, Matthijs Mekking wrote: > And I suggest to change it to "larger than 150", a value that open > source DNS vendors have been adopting over the last couple of months: > > https://nlnetlabs.nl/news/2021/Aug/12/unbound-1.13.2-released/ > https://blog.powerdns.com/2021/06/09/powerdns-recursor-4-4-4-and-4-5-2-released/ > https://www.knot-resolver.cz/2021-03-31-knot-resolver-5.3.1.html > https://bind9.readthedocs.io/en/v9_16_21/notes.html#notes-for-bind-9-16-16 > > (sorry that this is not pushing for lower numbers) Thanks for the specific number! On Thu, Oct 21, 2021 at 03:28:26PM +0200, Miek Gieben wrote: > I would recommend against using a limit that happens to be in use at the > current time, and > would just use 100 (or even lower). Resolvers will continue to work fine and > can lower their > limit at their leisure. Please be specific, or do you mean that a resolver should be free to choose any number above 0 (I'd suggest 1 in that case, as some operators appear to to be confident that 0 *is* one iteration, and so choose 1 just in case). That means that operators who choose non-zero values would have to keep adjusting the numbers down as resolvers gradually set the bar lower, and the only sane settings would then be 0 and 1. On Thu, Oct 21, 2021 at 03:49:13PM +0200, Matthijs Mekking wrote: > IIRC the vendors agreed on 150 for two reasons: > > 1. There are still a fair amount of zones using this value. Only a > handful of zones where using above 150. The numbers came down a lot, and publication of a lower number in the RFC could similarly drive these down further (pretty much just mijnhostingpartner.nl left at 150). > 2. Resolvers could still cope with such numbers pretty confidently. This is where I'm looking for experienced feedback from resolver maintainers and operators. I have deployment stats, but not performance stats. > I agree lower is better, but let's not pick a number randomly, but > have data to back up that number. I've provided the deployment numbers, others have relevant numbers on performance impact, please share. On Thu, Oct 21, 2021 at 07:24:21AM -0700, Paul Vixie wrote: > >> I would recommend against using a limit that happens to be in use > >> at the current time, and would just use 100 (or even lower). > >> Resolvers will continue to work fine and can lower their limit at > >> their leisure. > > +1. So all resolver behaviour is fair above 0 (and then I'd suggest also 1)? With resolver ceilings drifting down over time? Or something else? -- Viktor. _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop