TL;DR
In the light of an experiment outlined below, using a current version of
Unbound, I'm not convinced the protocol in this draft is proven to work.
Most importantly, it does not have running code for section 5.1, which
is advertised as security measure in the discussions during this WGLC.
Based on this observation, I claim draft version -09 is not suitable for
publication as a Standards Track RFC.
Excruciating details follow below.
On 3/18/25 18:09, Ben Schwartz wrote:
Note that the security benefits of NS revalidation only apply to
"strictly revalidating" resolvers (Section 5.1). "Opportunistically
revalidating" resolvers (Section 5.2) gain no meaningful security
benefit. The draft does not seem to RECOMMEND "strict revalidation",
presumably because it has slower resolution and higher SERVFAIL rate,
and is not widely deployed.
Implicitly, at least, "opportunistic revalidation" seems to be the
RECOMMENDED behavior. This recommendation must be justified by the
benefits of opportunistic revalidation, which are operational, not
security-oriented.
Ben got me interested so I gave it a try.
I was testing on:
- Unbound 1.22.0
- configured with "harden-referral-path: yes"
- configured with ICANN DNS trust anchor
- validator module loaded
- all other DNS-options at defaults.
Problem #1
----------
Depending on state of the cache, "harden-referral-path: yes" breaks
resolution of well-behaved domains if there is stuff missing in cache.
The extra queries trigger various anti-DoS limits, it seems. Try it
yourself on a live domain here:
a. Start a fresh Unbound validator: unbound -p -d -c conf
b. dig @unbound testiscorg.ch DNSKEY
;; ... status: SERVFAIL
c. ... wait 5 seconds
d. dig @unbound testiscorg.ch DNSKEY
;; ... status: NOERROR
Raising debug level with "unbound -vvvvvvvv" shows errors like this:
---
debug: request f.nic.ch. has exceeded the maximum global quota on number
of upstream queries 188
---
We have seen similar problems when adding anti-DoS limits into BIND, and
this protocol change is pushing the limits even harder. Even worse, if
we decide the limit is too low and raise it, we will end up with group
of researchers complaining we re-introduced CVEs these limits are meant
to protect from. I can already hear catchy name of the new attack,
"NXNSAttack - Revalidated!".
To work around this limit, I've repeated apex DS and DNSKEY queries with
delays in between until they both returned NOERROR. I.e. I waited and
primed the cache to avoid hitting the anti-DoS limits which could skew
the test results below.
Experiment Different Child NS IP
--------------------------------
Setup:
- Child NS contains a different name than parent NS
- New child NS name points to a different IP.
I ran tcpdump to observe DNS traffic. Results:
- First query into test zone is answered using parent-side NS.
- Subsequent recursion uses child-side NS.
Possibly good enough, assuming opportunistic implementation (section
5.2). But for sure it makes the overall system harder to debug as it is
unclear where the query should/will go without knowing exact state of
the cache.
Experiment Different TTL
------------------------
Setup:
- Child NS contains same RDATA as parent NS
- Child NS TTL = 15 seconds
- Parent NS TTL = 3600 seconds
Result:
Unbound goes through new round of query name minimization when child-NS
expires, getting new NS from the parent. Looks good.
Experiment Different TTL+Different Child NS IP
----------------------------------------------
Setup: Combining the previous two.
- Child has shorter TTL
- Child NS points to a different NS name than the parent NS.
- Child NS name has a different IP than the parent NS.
Results:
- Short TTL from child NS causes parent NS refresh as expected.
- First query after refresh goes to parent-side NS, ignoring the new NS
name in the child.
- Second query after the one which went to parent-NS goes to child-NS again.
Possibly good enough, assuming opportunistic implementation, but for
sure it makes system harder to debug.
Experiment Bogus NS
-------------------
Setup:
- Child NS is bogus. (Added 1 to RRSIG NS inception timestamp).
Result:
- Bogus NS at zone apex triggers SERVFAIL _only_ for an explicit apex NS
query.
- Parent-side NS RRs are still used for everything else. I.e. all other
queries into the affected zone are resolved using the parent-side NS.
Experiment Bogus A RR
---------------------
Setup:
- Child NS points to a different name than parent-NS.
- Child NS is correctly signed.
- Child NS target name has bogus RRSIG A.
E.g. the child-NS was:
@ NS newname
@ RRSIG NS ...secure...
newname A old_IP
newname RRSIG A ...bogus...
Result:
- Unbound ignored this bogus A RR for new NS name.
- Recursed using the old values obtained from parent.
Experiment evaluation
---------------------
Judging from this experiment, it seems Unbound 1.22.0 implements section
5.2, i.e. opportunistic variant.
FTR development version of BIND (commit
a2042e603e3f87cad9b6e63e628ba87003b3f6eb) does the same with bogus NS
and NS name targets: Bogus NS from child is ignored and parent NS is
used for further resolution. NS name target which has bogus A is also
ignored by BIND.
Conclusion
----------
To sum it up, the current running code for harden-referral-path: yes"
gets us the worst possible outcome:
- More queries
- Random breakage for completely legit domains
- No security benefit because bogus RRs discovered during 'hardening'
are just ignored, and unsigned versions are used instead
- Based on this, I speculate it provides no or marginal privacy benefit.
An attacker can make the response for apex NS bogus, circumventing the
whole mechanism and redirecting the traffic.
The only feature which is actually delivered by current implementation
is, when no attack happens, better TTL control from the child, at the
cost of making resolution less predictable.
Shumon, do you have an operational experience with this implementation?
I guess Appendix B. Implementation status could use some words about
actual implementation status and problems observed. Most importantly it
does not say Unbonud has only opportunistic version.
Unless another implementation surfaces, that means code we lack running
code for section 5.1.
I'm not convinced the protocol is proven. I think it is not in state for
publication as a Standards Track RFC.
Perhaps we can make it Informational and add text that it is known to
cause breakage described above?
If you reached this line, you owe me a dollar for doing QA on Unbound,
and I will buy you a cake for endurance ;-)
Petr Špaček
Internet Systems Consortium
--Ben Schwartz
------------------------------------------------------------------------
*From:* Willem Toorop <wil...@nlnetlabs.nl>
*Sent:* Tuesday, March 18, 2025 11:23 AM
*To:* Ondřej Surý <ond...@isc.org>
*Cc:* Peter Thomassen <pe...@desec.io>; Ralf Weber <d...@fl1ger.de>;
dnsop <dnsop@ietf.org>; dnsop-chairs <dnsop-cha...@ietf.org>
*Subject:* [DNSOP] Re: Working Group Last Call for draft-ietf-dnsop-ns-
revalidation "Delegation Revalidation by DNS Resolvers"
Op 18-03-2025 om 08:27 schreef Ondřej Surý:
On 17. 3. 2025, at 23:16, Willem Toorop <wil...@nlnetlabs.nl> wrote:
And in addition to that prevents all unsigned parts of the hijacked zone to be
rewritten. For example if com is hijacked, unsigned zones like google.com can
be redirected. Similarly if the root is hijacked all unsigned responses for the
entire DNS can be rewritten.
NS revalidation of signed delegations is the only mitigation that protects
against on-path or partly on-path attacks.
Willem,
this part caught my eye. Can we elaborate a little bit more?
0. With full 'on-path' attacker - there's no protection of unsigned zones with
or without NS revalidation. Hope we can agree on this.
If physically on path, then yes we can agree on that.
But if the attackers is on-path because it hijacked all the name server
IPs of a zone, then the attacker cannot also hijack referrals from that
zone if it is not also able to hijack the authoritative NS RRset and
addresses of that referred to zone. It can only block that delegation then.
So, what do you mean by partly on-path then?
For example if an attacker only hijacked part of the IP addresses of the
name servers serving the zone.
There's 26 IP addresses for the RZ, there's 26 IP addresses for .com and .net.
1. If the attacker sits on the 1-26 IP addresses for the .com/.net, the
unsigned zones are not protected, right? The attacker can give whatever the
GLUE they want.
Correct. And also whatever the NS set they want pointing those names to
an insecure zone for example to strengthen the attack.
But a DNSSEC validating NS revalidating resolver will detect that and
reject it.
2. If the attacker sits on 1-26 IP addresses for the RZ, this is where the NS
revalidation will possibly help for validating resolver. It will not do any
good for non-validating resolver, there's no difference as the attacker can
just either directly hijack the name by returning the data, or return own
referral.
There are some benefits for non-validating resolvers as well, but let's
leave that out for the moment. (but see Haya's paper that I referenced
before)
Now, correct me if I'm wrong – the whole NS revalidation process protects only
DNSSEC-enabled resolvers against attacks on the unsigned domains against
attackers on-path to the parent zone. Every other scenario is either directly
vulnerable or can be worked around by the attacker.
There is a bit more to it (see again Haya's paper about how it protects
non-parent centric resolvers against a whole series of other cache
poisoning attacks), but that is the strongest measure against query
redirection yes.
I get your point that this might improve the situation a little bit, but I
don't share the conclusion that this is worth the effort and the additional
complexity.
I understand that. It may be too complex to do in general, especially as
the child side delegations become less reliable deeper in the tree, but
doing it for example only at the root, as an extension to priming, would
already make a big difference, especially since a root hijack equals a
complete DNS tree hijack.
_______________________________________________
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org