Re: Outgoing DANE not working

Viktor Dukhovni Mon, 13 Apr 2020 14:42:51 -0700

On Mon, Apr 13, 2020 at 03:35:05PM -0400, Rich Felker wrote:

> > It is also not uncommon for applications that use SRV records to
> > encounter large RRsets (e.g. Windows Domain controller lists for
> > large Active-Directory domains in MIT Kerberos or Heimdal).
> 
> The justification here has always been that a number of clients are in
> positions where they can't perform tcp queries, e.g. their nameservers
> only support udp and possibly only support rfc1035.


The TC bit and TCP support are in RFC1025.  TCP is a required DNS
feature, it is NOT optional.  If nameservers fail to support TCP they're
broken.  The stub resolvers in BSD libraries and glibc do TCP, and don't
seem to have any real difficulties.  I'm inclined to say that the above
design decision is not evidence-based.

> Of course such an environment is incompatible with validating dnssec,
> but from the perspective of the domain defining the records, having so
> many/such long records (not counting signatures) that they can't be
> delivered to such clients without truncation means the domain has
> accessibility problems.

DNS supports large RRsets, and has had TC=1 for those for ~4 decades.
The issue is not specifically a DNSSEC issue.

> Fallback to tcp on TC would also yield very bad performance for users
> who are not running a local nameserver whenever looking up names with
> ridiculous numbers of A/AAAA records, where the truncated response
> certainly suffices (except in your example of FCrDNS).

Your local nameserver has already done the TCP failover and paid the
cost of obtaining the full RRset, your stub resolver is just failing to
give it the opportunity to return the full data to you.  The performance
cost is low, and such records are a minority.  Correctness trumps
performance where I come from.  Cutting corners for performance and
violating requirements is not acceptable.

> It's possible that some of these choices can be revisited over time,
> but they were made for good reasons, not at random.

They may be deliberate, but I rather disagree about the quality of the
reasons.

> > But some applications need to see the AD bit returned by the local
> > resolver in order to distiguish between validated and non-validated
> > results.  Recursive Nameservers (BIND, Unbound, ...) will only set
> > (when appropriate) the AD bit in replies if it is set in the incoming
> > query.  The AD bit is part of the standard DNS header:
> 
> Is the AD bit valid as part of a query?

Absolutely, and indeed it is required in order to solicit the AD bit
in return.  And e.g. dig(1) sets the AD bit in requests by default,
and you need to use "dig +noad" to turn it off!

> I couldn't find where this is documented, and it's almost certainly
> not supported (possibly rejected/dropped) by servers that aren't aware
> of it.

That is not the case.  In order for DNS to be extensible, servers are
required to ignore previously reserved flag bits, so that they can
later be assigned.

> >     The basic DNS header flags word is a mixture of flag bits and numbers,
> >     <https://tools.ietf.org/html/rfc2535#section-6.1>:
> >     
> >      +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
> >      |QR|   Opcode  |AA|TC|RD|RA| Z|AD|CD|   RCODE   |
> >      +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
> > 
> > and a one-line change in the musl-libc stub resolver can set the AD bit
> > when the target resolver is local (127.0.0.0/8 or ::1/128).
> 
> I'm confused whether you're saying it should be set in the outgoing
> query or forged in the response.

Set in the outgoing query, which solicits the actual value in the
reply.  Here's a normal query with "AD=1" in the outgoing request:

    $ dig +noedns +noall +comment +ans -t soa ietf.org
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29386
    ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

    ;; ANSWER SECTION:
    ietf.org.               1699    IN      SOA     ns0.amsl.com. 
glen.amsl.com. 1200000458 1800 1800 604800 1800

The AD bit is set in the reply, since ietf.org is signed.  Below is
the same query with "AD=0" in the outgoing request:

    $ dig +noad +noedns +noall +comment +ans -t soa ietf.org
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 65503
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

    ;; ANSWER SECTION:
    ietf.org.               1693    IN      SOA     ns0.amsl.com. 
glen.amsl.com. 1200000458 1800 1800 604800 1800

there is also no AD bit in the reply.  Implementors of stub resolvers
need to read many RFCs or consult experts who have:

    https://tools.ietf.org/html/rfc6840#section-5.7

> If the former, I don't see why it would be done conditional on
> being a local resolver (and also local need not be 127.0.0.1 or ::1;
> it can be public address of localhost or a lot of other things, e.g. a
> tunnel out of a container to the actual host, depending on network
> setup).

Because the AD bit from a non-local resolver is not trustworthy.  One
might imagine resolver configurations in which one can indicate that the
network path to a range of non-local IP addresses (perhaps IPSEC or
other secure link) is tamper-resistant, but as a default it may make
sense to ignore the AD bit from remote IPs.

Not ignoring is not worse than the situation that Postfix is in today,
where we don't know whether the AD bit returned by libresolv is
trustworthy or not, and just document the requirement for a local
resolver, and hope that users who want DANE security pay attention to
the docs.

However, I am suggesting that ignoring non-local AD bits would in fact
resolve that issue.  A more complete implementation would have a
configurable whitelist of "trusted" resolvers.

> Is the idea just that you assume as local one would support it
> whereas for a remote one it might be unknown?

No, in order to get the AD in a reply, you need to set it in
the request.  Modern resolvers do not return the AD bit otherwise.

> I don't think this kind of policy decision belongs in the stub
> resolver; for instance it would break in the other direction if you
> implemented nameserver on 127.0.0.1 (e.g. just to avoid needing a
> resolv.conf file) by an iptables rule to redirect to the real
> nameserver.

That's one way of signalling that you trust the path to the resolver.

> I think just adding a resolv.conf option for using the AD bit might be
> appropriate. One issue that makes this more complicated though is how
> the API is factored.

You can safely set it unconditionally, or just to the loopback ones (to
help remove an AD-bit MiTM footgun).  No known resolvers will object to
the AD in queries.

> res_mkquery in theory doesn't/shouldn't depend on
> the particular nameservers, but should just serialize a query that can
> be used with any server (e.g. my implementation of host(1) does this
> to send to the server you give it on the command line). But the choice
> of configuration is specific to the configured nameservers.

You can inject the AD bit just before sending the packet to a particular
server.

> > Sorry, we actually need to know which records were validated in
> > signed domains, and which are "insecure" responses from unsigned
> > domains.  That's what the AD bit is for, and you're not setting
> > it in requests, and so it does not come back in the response.
> 
> Can you describe why?

I can, but you can just read RFC 7672 if you like, I've already
explained it there.  Bottom line, it is needed.

> Is it only for the sake of not using TLSA
> records in unsigned domains? That kind of policy can be implemented at
> the resolver level

It cannot and should not be implemented at the resolver level.

-- 
    Viktor.

Re: Outgoing DANE not working

Reply via email to