On Wed, Oct 20, 2021 at 11:24:47AM -0700, Wes Hardaker wrote:

> But, as Viktor indicated in his posts, we could move even lower (100
> being the next obvious step, but even lower is possible to still retain
> a reasonable percentage).  But there is of course a risk of we'll never
> get to a definitive value, and may operators by constantly lowering it
> and they have to keep changing values.
> 
> So, the question: what's the right FINAL value to put in the draft
> before LC?

Some observations to help the decision process:

   1. A validating resolver that prefers to SERVFAIL on all responses
      with excessive iterations, avoiding downgrades to "insecure",
      can simply ignore such NSEC3 records, and if no appropriate NSEC
      or NSEC3 records remain can then treat the reply as bogus.

   2. The downside of insecure downgrade is that any affected zones are
      subject to forgery of all names strictly below the zone apex, via
      fake insecure delegations (the denial of existence of DS records
      will be accepted), and also NODATA forgery for all qtypes at the
      zone apex (except NSEC3 and RRSIG).

   3. The downside of SERVFAIL for excess iterations is that if the
      target zone handles names of SMTP hosts without DANE TLSA records,
      then TLSA denial of existence failure with render these mail
      servers unavailable to DANE-enabled SMTP clients.

      Also any wildcard replies that are based of non-existence proofs
      of the qname, ... will be bogus, thus e.g. wildcard A/AAAA answers
      are likely to SERVFAIL.

   4. The cost of P256 signature verification is (on a now somewhat dated
      Xeon Skylake system) ~300 times that of a SHA1 hash.  Thus south
      of 150 iterations, further reductions in the iteration count offer
      only a modest benefit to validating resolvers that are also
      validating the signature.

   5. However, no signature verification applies on the authoritative
      server (perhaps a secondary that did not specifically "volunteer"
      to serve zones with a high iteration count).

      Also when doing aggressive negative caching via previously
      received NSEC3 records, once again only SHA1 hashing is involve,
      the signature verification happened when the records were cached.

Therefore, while a softfail to insecure makes it possible to avoid
immediate pain, the SERVFAIL alternative is simpler to implement
correctly, but may require setting the bar somewhat higher.

Ideally, all zone operators would get the message, apply a realistic
threat model, and set the iteration counts to 0.

Much progress has been made in a comparatively short time, but pockets
of "resistance" remain, with a large majority of domains in the [1-20]
range, and low but perhaps non-negligible zone counts (out of 12.46M
zones) for:

    50 iterations: ~13k
    100 iterations: ~20k (7.9k netcup.de, 2.2k nlhosting.net, 2.1k 
core-networks.de)
    150 iterations: ~6k  (5.8k mijnhostingpartner.nl)
    500 iterations: 101  (85 raytheon.com)

With a bit more nagging we could probably convince the small number of
operators that dominate the counts in question to make adjustments.

Otherwise, we can declare victory at either 100 or 150, and recommend
SERVFAIL above 500, but MAY SERVFAIL at the lower cutoff.

I'd like to see more responses with specific numbers, and thoughts on
whether a range in which downgrade to insecure happens is a good or
bad idea.

That is, is it always either AD=1 or SERVFAIL, or is there merit in
AD=0 for a range of values above a soft cutoff before a higher hard
cutoff is reached.

At this point, my inclination is to hardfail at 150 and avoid softfail.
Raytheon may be briefly incovenienced, but otherwise this feels the most
robust option, unless rough consensus is to try to set the bar lower and
softfail from there to some suitable upper bound in the 150 to 500
range.

On Thu, Oct 21, 2021 at 01:22:25PM +0200, Peter van Dijk wrote:

> I don't know what the -right- value is, but I know what I want: 0
> iterations, empty salt, otherwise the NSEC3 gets ignored, presumably
> leading to SERVFAIL. This removes the 'insecure' window completely.
> 
> So, I'll support any push to lower the numbers.

Please be specific, even if you feel you'll land in the rough.

> Editorial nit, already hinted at above: the text currently has
> "Validating resolvers MAY return SERVFAIL when processing NSEC3
> records with iterations larger than 500." - I suggest changing this to
> "validating resolvers MAY ignore NSEC3 records with iterations larger
> than 500". That way, zones in the middle of a transition from 1000 to
> 0 iterations do not get punished. Zones at 1000, not in a transition,
> will still get SERVFAIL by virtue of the NSEC3 proof missing (because
> it is ignored).

Thanks, I think I agree. Ignore records with counts that would SERVFAIL
if best-available, sounds sensible.

On Thu, Oct 21, 2021 at 02:52:47PM +0200, Matthijs Mekking wrote:

> And I suggest to change it to "larger than 150", a value that open 
> source DNS vendors have been adopting over the last couple of months:
> 
> https://nlnetlabs.nl/news/2021/Aug/12/unbound-1.13.2-released/
> https://blog.powerdns.com/2021/06/09/powerdns-recursor-4-4-4-and-4-5-2-released/
> https://www.knot-resolver.cz/2021-03-31-knot-resolver-5.3.1.html
> https://bind9.readthedocs.io/en/v9_16_21/notes.html#notes-for-bind-9-16-16
> 
> (sorry that this is not pushing for lower numbers)

Thanks for the specific number!

On Thu, Oct 21, 2021 at 03:28:26PM +0200, Miek Gieben wrote:

> I would recommend against using a limit that happens to be in use at the 
> current time, and
> would just use 100 (or even lower). Resolvers will continue to work fine and 
> can lower their
> limit at their leisure.

Please be specific, or do you mean that a resolver should be free to
choose any number above 0 (I'd suggest 1 in that case, as some operators
appear to to be confident that 0 *is* one iteration, and so choose 1
just in case).

That means that operators who choose non-zero values would have to keep
adjusting the numbers down as resolvers gradually set the bar lower, and
the only sane settings would then be 0 and 1.

On Thu, Oct 21, 2021 at 03:49:13PM +0200, Matthijs Mekking wrote:
> IIRC the vendors agreed on 150 for two reasons:
> 
> 1. There are still a fair amount of zones using this value. Only a
> handful of zones where using above 150.

The numbers came down a lot, and publication of a lower number in the
RFC could similarly drive these down further (pretty much just
mijnhostingpartner.nl left at 150).

> 2. Resolvers could still cope with such numbers pretty confidently.

This is where I'm looking for experienced feedback from resolver
maintainers and operators.  I have deployment stats, but not performance
stats.

> I agree lower is better, but let's not pick a number randomly, but
> have data to back up that number.

I've provided the deployment numbers, others have relevant numbers
on performance impact, please share.

On Thu, Oct 21, 2021 at 07:24:21AM -0700, Paul Vixie wrote:

> >> I would recommend against using a limit that happens to be in use
> >> at the current time, and would just use 100 (or even lower).
> >> Resolvers will continue to work fine and can lower their limit at
> >> their leisure.
> 
> +1.

So all resolver behaviour is fair above 0 (and then I'd suggest also 1)?
With resolver ceilings drifting down over time?  Or something else?

-- 
    Viktor.

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to