TL;DR
In the light of an experiment outlined below, using a current version of Unbound, I'm not convinced the protocol in this draft is proven to work. Most importantly, it does not have running code for section 5.1, which is advertised as security measure in the discussions during this WGLC.

Based on this observation, I claim draft version -09 is not suitable for publication as a Standards Track RFC.

Excruciating details follow below.


On 3/18/25 18:09, Ben Schwartz wrote:
Note that the security benefits of NS revalidation only apply to "strictly revalidating" resolvers (Section 5.1).  "Opportunistically revalidating" resolvers (Section 5.2) gain no meaningful security benefit.  The draft does not seem to RECOMMEND "strict revalidation", presumably because it has slower resolution and higher SERVFAIL rate, and is not widely deployed.

Implicitly, at least, "opportunistic revalidation" seems to be the RECOMMENDED behavior.  This recommendation must be justified by the benefits of opportunistic revalidation, which are operational, not security-oriented.

Ben got me interested so I gave it a try.

I was testing on:
- Unbound 1.22.0
- configured with "harden-referral-path: yes"
- configured with ICANN DNS trust anchor
- validator module loaded
- all other DNS-options at defaults.

Problem #1
----------
Depending on state of the cache, "harden-referral-path: yes" breaks resolution of well-behaved domains if there is stuff missing in cache. The extra queries trigger various anti-DoS limits, it seems. Try it yourself on a live domain here:

a. Start a fresh Unbound validator: unbound -p -d -c conf

b. dig @unbound testiscorg.ch DNSKEY
;; ... status: SERVFAIL

c. ... wait 5 seconds

d. dig @unbound testiscorg.ch DNSKEY
;; ... status: NOERROR

Raising debug level with "unbound -vvvvvvvv" shows errors like this:
---
debug: request f.nic.ch. has exceeded the maximum global quota on number of upstream queries 188
---

We have seen similar problems when adding anti-DoS limits into BIND, and this protocol change is pushing the limits even harder. Even worse, if we decide the limit is too low and raise it, we will end up with group of researchers complaining we re-introduced CVEs these limits are meant to protect from. I can already hear catchy name of the new attack, "NXNSAttack - Revalidated!".


To work around this limit, I've repeated apex DS and DNSKEY queries with delays in between until they both returned NOERROR. I.e. I waited and primed the cache to avoid hitting the anti-DoS limits which could skew the test results below.


Experiment Different Child NS IP
--------------------------------
Setup:
- Child NS contains a different name than parent NS
- New child NS name points to a different IP.

I ran tcpdump to observe DNS traffic. Results:
- First query into test zone is answered using parent-side NS.
- Subsequent recursion uses child-side NS.

Possibly good enough, assuming opportunistic implementation (section 5.2). But for sure it makes the overall system harder to debug as it is unclear where the query should/will go without knowing exact state of the cache.


Experiment Different TTL
------------------------
Setup:
- Child NS contains same RDATA as parent NS
- Child NS TTL = 15 seconds
- Parent NS TTL = 3600 seconds

Result:
Unbound goes through new round of query name minimization when child-NS expires, getting new NS from the parent. Looks good.


Experiment Different TTL+Different Child NS IP
----------------------------------------------
Setup: Combining the previous two.
- Child has shorter TTL
- Child NS points to a different NS name than the parent NS.
- Child NS name has a different IP than the parent NS.

Results:
- Short TTL from child NS causes parent NS refresh as expected.
- First query after refresh goes to parent-side NS, ignoring the new NS name in the child.
- Second query after the one which went to parent-NS goes to child-NS again.

Possibly good enough, assuming opportunistic implementation, but for sure it makes system harder to debug.


Experiment Bogus NS
-------------------
Setup:
- Child NS is bogus. (Added 1 to RRSIG NS inception timestamp).

Result:
- Bogus NS at zone apex triggers SERVFAIL _only_ for an explicit apex NS query. - Parent-side NS RRs are still used for everything else. I.e. all other queries into the affected zone are resolved using the parent-side NS.


Experiment Bogus A RR
---------------------
Setup:
- Child NS points to a different name than parent-NS.
- Child NS is correctly signed.
- Child NS target name has bogus RRSIG A.
E.g. the child-NS was:
@ NS newname
@ RRSIG NS ...secure...
newname A old_IP
newname RRSIG A ...bogus...

Result:
- Unbound ignored this bogus A RR for new NS name.
- Recursed using the old values obtained from parent.


Experiment evaluation
---------------------
Judging from this experiment, it seems Unbound 1.22.0 implements section 5.2, i.e. opportunistic variant.

FTR development version of BIND (commit a2042e603e3f87cad9b6e63e628ba87003b3f6eb) does the same with bogus NS and NS name targets: Bogus NS from child is ignored and parent NS is used for further resolution. NS name target which has bogus A is also ignored by BIND.


Conclusion
----------
To sum it up, the current running code for harden-referral-path: yes" gets us the worst possible outcome:
- More queries
- Random breakage for completely legit domains
- No security benefit because bogus RRs discovered during 'hardening' are just ignored, and unsigned versions are used instead
- Based on this, I speculate it provides no or marginal privacy benefit.
An attacker can make the response for apex NS bogus, circumventing the whole mechanism and redirecting the traffic.

The only feature which is actually delivered by current implementation is, when no attack happens, better TTL control from the child, at the cost of making resolution less predictable.


Shumon, do you have an operational experience with this implementation?


I guess Appendix B. Implementation status could use some words about actual implementation status and problems observed. Most importantly it does not say Unbonud has only opportunistic version.

Unless another implementation surfaces, that means code we lack running code for section 5.1.

I'm not convinced the protocol is proven. I think it is not in state for publication as a Standards Track RFC.


Perhaps we can make it Informational and add text that it is known to cause breakage described above?

If you reached this line, you owe me a dollar for doing QA on Unbound, and I will buy you a cake for endurance ;-)

Petr Špaček
Internet Systems Consortium




--Ben Schwartz
------------------------------------------------------------------------
*From:* Willem Toorop <wil...@nlnetlabs.nl>
*Sent:* Tuesday, March 18, 2025 11:23 AM
*To:* Ondřej Surý <ond...@isc.org>
*Cc:* Peter Thomassen <pe...@desec.io>; Ralf Weber <d...@fl1ger.de>; dnsop <dnsop@ietf.org>; dnsop-chairs <dnsop-cha...@ietf.org> *Subject:* [DNSOP] Re: Working Group Last Call for draft-ietf-dnsop-ns- revalidation "Delegation Revalidation by DNS Resolvers"

Op 18-03-2025 om 08:27 schreef Ondřej Surý:
On 17. 3. 2025, at 23:16, Willem Toorop <wil...@nlnetlabs.nl> wrote:

And in addition to that prevents all unsigned parts of the hijacked zone to be 
rewritten. For example if com is hijacked, unsigned zones like google.com can 
be redirected. Similarly if the root is hijacked all unsigned responses for the 
entire DNS can be rewritten.
NS revalidation of signed delegations is the only mitigation that protects 
against on-path or partly on-path attacks.
Willem,

this part caught my eye. Can we elaborate a little bit more?

0. With full 'on-path' attacker - there's no protection of unsigned zones with 
or without NS revalidation. Hope we can agree on this.

If physically on path, then yes we can agree on that.

But if the attackers is on-path because it hijacked all the name server
IPs of a zone, then the attacker cannot also hijack referrals from that
zone if it is not also able to hijack the authoritative NS RRset and
addresses of that referred to zone. It can only block that delegation then.

So, what do you mean by partly on-path then?
For example if an attacker only hijacked part of the IP addresses of the
name servers serving the zone.
There's 26 IP addresses for the RZ, there's 26 IP addresses for .com and .net.

1. If the attacker sits on the 1-26 IP addresses for the .com/.net, the 
unsigned zones are not protected, right? The attacker can give whatever the 
GLUE they want.

Correct. And also whatever the NS set they want pointing those names to
an insecure zone for example to strengthen the attack.

But a DNSSEC validating NS revalidating resolver will detect that and
reject it.

2. If the attacker sits on 1-26 IP addresses for the RZ, this is where the NS 
revalidation will possibly help for validating resolver. It will not do any 
good for non-validating resolver, there's no difference as the attacker can 
just either directly hijack the name by returning the data, or return own 
referral.
There are some benefits for non-validating resolvers as well, but let's
leave that out for the moment. (but see Haya's paper that I referenced
before)
Now, correct me if I'm wrong – the whole NS revalidation process protects only 
DNSSEC-enabled resolvers against attacks on the unsigned domains against 
attackers on-path to the parent zone. Every other scenario is either directly 
vulnerable or can be worked around by the attacker.
There is a bit more to it (see again Haya's paper about how it protects
non-parent centric resolvers against a whole series of other cache
poisoning attacks), but that is the strongest measure against query
redirection yes.
I get your point that this might improve the situation a little bit, but I 
don't share the conclusion that this is worth the effort and the additional 
complexity.

I understand that. It may be too complex to do in general, especially as
the child side delegations become less reliable deeper in the tree, but
doing it for example only at the root, as an extension to priming, would
already make a big difference, especially since a root hijack equals a
complete DNS tree hijack.

_______________________________________________
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org

Reply via email to