[TLS] Re: -03 update to draft-beck-tls-trust-anchor-ids

David Benjamin Thu, 09 Jan 2025 18:10:41 -0800

Thanks for the comments! Some thoughts inline.

On Sat, Dec 21, 2024 at 8:59 AM Ilari Liusvaara <ilariliusva...@welho.com>
wrote:


> Some issues I have been thinking about (this all concentrates on server
> certificates):
>
>
> 1) Certificate chain mismatches between services on the same server:
>
> Trust Anchor IDs uses ServiceMode HTTPS/SVCB records to store list
> of available roots in DNS. These are logically per (logical) server.
>
> However, the available roots may differ between two services that
> are pointed to the same server via AliasMode HTTPS/SVCB records.
> Especially if considering intermediates as roots for intermediate
> elision (e.g., Let's Encrypt randomly issues off two intermediates).
>
> Furthermore, ECH adds another implicit service to every server in
> order to update ECH keys or to disable ECH.
>

To make sure I follow, the comment is that the right HTTPS/SVCB record for
a given route is specific not just to the hosting provider but also the
particular origin that provider hosts? And that, if it happened this was
your first SvcParam that varied like this, you might have gotten away with
a single SVCB record before and hit some friction?

That's true, but I think that's inevitable from the design of HTTPS/SVCB,
and really can apply to any SvcParam. I know one ECH deployment ran into
issues because they had some TLS-1.3-capable and TLS-1.2-only
hosts, because there was an option for customers to turn 1.3 off. If you
advertise ECH keys for the 1.2-only hosts, connections would break or at
least need the recovery flow to repair themselves. Under the design folks
picked for HTTPS/SVCB, the right way to do this is for your 1.3-capable and
1.2-only hosts to serve different records.

draft-ietf-tls-key-share-prediction has a similar property. If you have, or
later wish to have, any capability for different hosts to vary their named
group preferences, you need to be able to install different records. (In my
experience, it is very common for folks to ask for this.) My understanding
from CDN operators is that they generally prefer to CNAME or AliasForm to
origin.cdn.example for precisely this reason, because anything in the
record that can vary by customer would trip this. (If dnsop wanted a richer
inter-SVCB delegation model, I guess AliasForm could have had SvcParams
themselves and then the resolver stitches them together at query time. That
wasn't the design they picked, so I think origin.cdn.example is the
conclusion.)

Of course, if you get it wrong, this only matters for the DNS part of the
flow. You still have several chances to get this right, from:
- When nothing matches, you can send some default chain based on what works
in most clients, because if that doesn't work...
- The retry flow will also repair this.
(More on this later.)

Indeed if you have an efficient chain that works for "most" clients and are
willing to send older ones through the retry flow, you don't necessarily
even need the DNS thing. Though I do think the DNS thing is very valuable
to avoid sacrificing the cases that aren't "most" clients, and give
everyone a bit more breathing room if they need to support many, many
clients. I definitely hear from this list a lot that not everything is a
browser, so I think it's important that our designs not assume every
application is browser-like in its PKI. More on those below.


> 2) Load-balancing certificate flip-flop:
>
> If server is load balanced, reconnects may go to different server,
> which might be out of sync with the previous server.
>
> As consequence, root lists from previous connection might not be
> correct for reconnection (especially intermediates).
>
> Just because some service has one IPv4 and one IPv6 address does not
> mean there might not be plenty of flip-flop with updates.
>
>
> 3) Connections racing with certificate changes:
>
> Even without load balancing, the root lists might change between
> reconnections (especially intermediats). Servers usually withdraw
> certificates suddenly with no grace period.
>
> In server software of capable of hot-reloading certificates, such
> races could even occur between sending HRR and client retrying.
>
>
> 4) How to repair mismatches:
>
> I think that the only feasible way to repair mismatches on server
> side would be to use HRR (regardless of any potential bad
> interactions with ECH).
>
> I think altering the TLS state machine for adding a new optional
> pair of flights would be far too big change. And since reconnects can
> not be transparent, those are no-go for many uses. Including ones that
> would benefit a lot from trust anchor negotiation.
>
> I don't think there is a fourth way.



And reconnects can pile up exponetially:
>
> - Client tries with bad ECH, bad trust anchors, server fixes trust
>   anchors.
> - Client tries with bad ECH, good trust anchors, server fixes ECH.
> - Client tries with good ECH, bad trust anchors, server fixes trust
>   anchors (for different service!).
> - Client tries with good ECH, good trust anchors. This succeds.


I think these issues are largely the same trade-off as ECH w.r.t. whether
to retry on the same or different connections. For ECH, the WG did not want
to build an in-handshake retry, as recovery should be rare. The thinking
with this initial draft was similar, so while the interaction is not ideal,
it requires both meant-to-be-rare events to go wrong at the same time.

That's not to say that's the only design. Even back in the Vancouver
presentation, we've been talking about maybe doing an in-protocol retry
instead. More broadly, I think there is a space of solutions here that are
all variations on the general approach here:

- Extend HRR to do the retry in-connection, so that retries are less
expensive. (This interacts a bit with some feedback we've gotten, that the
retry is useful for repairing more complex edge cases, so something to keep
in mind when designing this.)
- Reduce the need for retry by robustly getting the DNS bits to the client
(perhaps we should get good at draft-ietf-tls-wkech-like designs)
- Reduce the need for retry by having servers make a decent guess on
mismatch, e.g. targeting what works well for common clients, safe in the
knowledge that the *it's** still okay if the guess is wrong*. That
distinction makes a world of difference, because it means we don't need to
hit 100% and still succeed.
- Reduce the need for retry by, as you suggest, tuning how the client
responds to the DNS record, to try to recover even some mismatch cases
- Reduce (remove?) the need for retry by encoding the trust anchor list
more compactly (ultimately, all of these designs are just compressions of
the standard certificate_authorities extension)

All of these are things that I (and my coauthors!) am very, very eager to
iterate on and explore with the working group. But I think these kinds of
things are best done within the auspices of the working group, after
adoption, so it can better reflect the group's consensus. I hope we can
move on to this more productive (and, frankly, more interesting) part
soon, now that we have a clear interim result to build something here.
What's in the draft was just the simplest thing to describe, as a hopefully
useful starting point for the WG to iterate from.

One minor point:

> In server software of capable of hot-reloading certificates, such
> races could even occur between sending HRR and client retrying.

I think, if you're hot-reloading certificates, or any other config, you
really should make sure that any in-flight connections still act on the old
config when you've already made a decision based on it. Otherwise you might
have sent some messages based on one config and then try to complete the
handshake based on the other. (E.g. if you HRR to x25519 but then
hot-reload to a P-256 only configuration, you should still finish that
pending handshake at x25519 because anything else will be invalid anyway.)


> Then reconnects also bypass some TLS 1.3 downgrade protections. The
> client has to be very careful not to introduce downgrade attacks in the
> process. Furthermore, no authentication of any kind is possible here,
> so the previous connection might have been to an attacker.
>

I don't think this meaningfully impacts downgrade protection. We *already*
don't have downgrade protection for anything in the authentication parts of
TLS (certificate, signature algorithm, transcript hash) because the
downgrade signal itself depends on it. For example, if a server had a
compromised certificate and a strong certificate, and the client accepts
both, the attacker can just use the compromised credential to sign a
CertificateVerify message with a transcript that says "yes, this is the
certificate I would have picked in this case" and move on. Likewise, if you
were trying to exploit some kind of weak hash in the signature algorithm,
you could send a different ClientHello to get the server to respond with
something that collides with the transcript you present to the client, etc.

(To get downgrade protections across certificates, we would need something
like the weaker certificate including a non-critical extension that says
"this other certificate also exists, so if you see this extension and
understand this OID, reject this". That would certainly be an interesting
thing to design, but orthogonal to all this work.)

As an aside: this also isn't specific to the specific retry mechanic. The
DNS flow with today's broadly untrusted DNS has all the same properties.
The retry flow is ultimately just making the client act as if it saw that
particular DNS response.


> 5) Client filtering of IDs:
>
> Intermediate elision presumably requires including Trust Anchor IDs
> for intermediates, and that presents some special considerations.
>
> I presume that intermediate IDs should not be included without having
> an root ID the intermediate chains to.
>
> And where CA used is known to have unpredictable/backup issuers, or
> rotate issuers, it could be useful to include related issuers even
> if the server does not signal those (it is just a few dozen bytes at
> most, I think the 5-set Let's Encrypt uses is exceptionally large).
>

I think this is already mostly covered by this text:

> If doing so, the client MAY send a subset of this intersection to meet
size constraints, but SHOULD offer multiple options. This reduces the
chance of a reconnection if, for example, the first option in the
intersection uses a signature algorithm that the client doesn't support, or
if the TLS server and DNS configuration are out of sync.

https://www.ietf.org/archive/id/draft-beck-tls-trust-anchor-ids-03.html#name-client-behavior

Though certainly there is plenty of room to iterate with the working group
on the right things for the client to offer. I personally favor designs
where the client logic doesn't need to know the relationship between
intermediate and root IDs (PKIs can have sprawling structures, so this may
not be well-defined). I think that's not necessary to get most of what you
describe. Since the server already will have long and short chains
provisioned, the DNS record will contain both intermediate and root IDs.
The client will then see both when filtering, so I should "SHOULD offer
multiple" already captures it. We won't get the "related issuers" case
(could be interesting to add), but the root will at least match the long
chain.


> Then some ideas:
>
> - If Trust Anchor IDs fails to negotiate a certificate, it should just
>   be ignored. That is, fall back to whatever processing would happen
>   without the extension.
>
>   Server should not explicitly fail unless it requires trust anchor
>   negotiation, and is just plain incompatible with clients that do not
>   support it.
>

Yup! That is very much the intent here. It's phrased in a way to align with
Section 4.4.2.2 of RFC 8446, though I certainly could have gotten it wrong!
We got feedback that folks wanted it to coexist cleanly with
certificate_authorities, so I tried to write the text fitting the current
setup. But, as with anything else, the text can change to whatever the
working group consensus is when adopted.


> - Send all the issuer chains in every (extended) ACME response, or have
>   dedicated endpoint to download the issuer chain bundle per issuer.
>
>   This avoids having ACME client combine multiple chains from
>   alternatives. Which is not as easy as it sounds because the EE
>   certificates can differ, which can lead to really bizarre edge cases.
>
>   Having bundle of issuer chains also fits a lot better with the
>   configuration model of a lot of existing TLS server software.
>

The feedback we got from Aaron Gable (CC'd) was to not tie multiple
issuances to a single ACME order (e.g. imagine if you need to renew one but
not the other), which suggests, at least at the ACME protocol level, to go
in the other direction. That's why we've kept the single-chain MIME type
and then further pared the ACME text in this draft to just the shared-EE
case. Definitely the more general cases also need to be solved, but given
that, it seemed better to tackle that separately.

Beyond that, what gets sent on the wire in ACME doesn't
necessarily determine the interface between the ACME client and the TLS
server software. If some single-file multi-chain bundle is more convenient
for a lot of server operators, the ACME client can always stitch that
together as it wishes. I think, for this draft, it makes sense to focus on
what happens in the TLS protocol, rather than the ACME client <-> TLS
server channel. That's typically what's in scope for a TLS document. After
all, RFC 8446 already has a notion of certificate selection, across many
axes, and is quite content doing so without saying anything at all about
how to provision multiple certificates.

Of course, above all, like everything else, if the WG feels a different
ACME direction makes sense (probably in consultation with ACME
experts---I'm certainly not one!), that can certainly change. Based on my
current understanding of things, and feedback from ACME experts, I still
think what's in draft-03 is right, but we'll see.


Thanks again for the comments!

David

_______________________________________________
TLS mailing list -- tls@ietf.org
To unsubscribe send an email to tls-le...@ietf.org

[TLS] Re: -03 update to draft-beck-tls-trust-anchor-ids

Reply via email to