Thanks for the comments! Some thoughts inline. On Sat, Dec 21, 2024 at 8:59 AM Ilari Liusvaara <ilariliusva...@welho.com> wrote:
> Some issues I have been thinking about (this all concentrates on server > certificates): > > > 1) Certificate chain mismatches between services on the same server: > > Trust Anchor IDs uses ServiceMode HTTPS/SVCB records to store list > of available roots in DNS. These are logically per (logical) server. > > However, the available roots may differ between two services that > are pointed to the same server via AliasMode HTTPS/SVCB records. > Especially if considering intermediates as roots for intermediate > elision (e.g., Let's Encrypt randomly issues off two intermediates). > > Furthermore, ECH adds another implicit service to every server in > order to update ECH keys or to disable ECH. > To make sure I follow, the comment is that the right HTTPS/SVCB record for a given route is specific not just to the hosting provider but also the particular origin that provider hosts? And that, if it happened this was your first SvcParam that varied like this, you might have gotten away with a single SVCB record before and hit some friction? That's true, but I think that's inevitable from the design of HTTPS/SVCB, and really can apply to any SvcParam. I know one ECH deployment ran into issues because they had some TLS-1.3-capable and TLS-1.2-only hosts, because there was an option for customers to turn 1.3 off. If you advertise ECH keys for the 1.2-only hosts, connections would break or at least need the recovery flow to repair themselves. Under the design folks picked for HTTPS/SVCB, the right way to do this is for your 1.3-capable and 1.2-only hosts to serve different records. draft-ietf-tls-key-share-prediction has a similar property. If you have, or later wish to have, any capability for different hosts to vary their named group preferences, you need to be able to install different records. (In my experience, it is very common for folks to ask for this.) My understanding from CDN operators is that they generally prefer to CNAME or AliasForm to origin.cdn.example for precisely this reason, because anything in the record that can vary by customer would trip this. (If dnsop wanted a richer inter-SVCB delegation model, I guess AliasForm could have had SvcParams themselves and then the resolver stitches them together at query time. That wasn't the design they picked, so I think origin.cdn.example is the conclusion.) Of course, if you get it wrong, this only matters for the DNS part of the flow. You still have several chances to get this right, from: - When nothing matches, you can send some default chain based on what works in most clients, because if that doesn't work... - The retry flow will also repair this. (More on this later.) Indeed if you have an efficient chain that works for "most" clients and are willing to send older ones through the retry flow, you don't necessarily even need the DNS thing. Though I do think the DNS thing is very valuable to avoid sacrificing the cases that aren't "most" clients, and give everyone a bit more breathing room if they need to support many, many clients. I definitely hear from this list a lot that not everything is a browser, so I think it's important that our designs not assume every application is browser-like in its PKI. More on those below. > 2) Load-balancing certificate flip-flop: > > If server is load balanced, reconnects may go to different server, > which might be out of sync with the previous server. > > As consequence, root lists from previous connection might not be > correct for reconnection (especially intermediates). > > Just because some service has one IPv4 and one IPv6 address does not > mean there might not be plenty of flip-flop with updates. > > > 3) Connections racing with certificate changes: > > Even without load balancing, the root lists might change between > reconnections (especially intermediats). Servers usually withdraw > certificates suddenly with no grace period. > > In server software of capable of hot-reloading certificates, such > races could even occur between sending HRR and client retrying. > > > 4) How to repair mismatches: > > I think that the only feasible way to repair mismatches on server > side would be to use HRR (regardless of any potential bad > interactions with ECH). > > I think altering the TLS state machine for adding a new optional > pair of flights would be far too big change. And since reconnects can > not be transparent, those are no-go for many uses. Including ones that > would benefit a lot from trust anchor negotiation. > > I don't think there is a fourth way. And reconnects can pile up exponetially: > > - Client tries with bad ECH, bad trust anchors, server fixes trust > anchors. > - Client tries with bad ECH, good trust anchors, server fixes ECH. > - Client tries with good ECH, bad trust anchors, server fixes trust > anchors (for different service!). > - Client tries with good ECH, good trust anchors. This succeds. I think these issues are largely the same trade-off as ECH w.r.t. whether to retry on the same or different connections. For ECH, the WG did not want to build an in-handshake retry, as recovery should be rare. The thinking with this initial draft was similar, so while the interaction is not ideal, it requires both meant-to-be-rare events to go wrong at the same time. That's not to say that's the only design. Even back in the Vancouver presentation, we've been talking about maybe doing an in-protocol retry instead. More broadly, I think there is a space of solutions here that are all variations on the general approach here: - Extend HRR to do the retry in-connection, so that retries are less expensive. (This interacts a bit with some feedback we've gotten, that the retry is useful for repairing more complex edge cases, so something to keep in mind when designing this.) - Reduce the need for retry by robustly getting the DNS bits to the client (perhaps we should get good at draft-ietf-tls-wkech-like designs) - Reduce the need for retry by having servers make a decent guess on mismatch, e.g. targeting what works well for common clients, safe in the knowledge that the *it's** still okay if the guess is wrong*. That distinction makes a world of difference, because it means we don't need to hit 100% and still succeed. - Reduce the need for retry by, as you suggest, tuning how the client responds to the DNS record, to try to recover even some mismatch cases - Reduce (remove?) the need for retry by encoding the trust anchor list more compactly (ultimately, all of these designs are just compressions of the standard certificate_authorities extension) All of these are things that I (and my coauthors!) am very, very eager to iterate on and explore with the working group. But I think these kinds of things are best done within the auspices of the working group, after adoption, so it can better reflect the group's consensus. I hope we can move on to this more productive (and, frankly, more interesting) part soon, now that we have a clear interim result to build something here. What's in the draft was just the simplest thing to describe, as a hopefully useful starting point for the WG to iterate from. One minor point: > In server software of capable of hot-reloading certificates, such > races could even occur between sending HRR and client retrying. I think, if you're hot-reloading certificates, or any other config, you really should make sure that any in-flight connections still act on the old config when you've already made a decision based on it. Otherwise you might have sent some messages based on one config and then try to complete the handshake based on the other. (E.g. if you HRR to x25519 but then hot-reload to a P-256 only configuration, you should still finish that pending handshake at x25519 because anything else will be invalid anyway.) > Then reconnects also bypass some TLS 1.3 downgrade protections. The > client has to be very careful not to introduce downgrade attacks in the > process. Furthermore, no authentication of any kind is possible here, > so the previous connection might have been to an attacker. > I don't think this meaningfully impacts downgrade protection. We *already* don't have downgrade protection for anything in the authentication parts of TLS (certificate, signature algorithm, transcript hash) because the downgrade signal itself depends on it. For example, if a server had a compromised certificate and a strong certificate, and the client accepts both, the attacker can just use the compromised credential to sign a CertificateVerify message with a transcript that says "yes, this is the certificate I would have picked in this case" and move on. Likewise, if you were trying to exploit some kind of weak hash in the signature algorithm, you could send a different ClientHello to get the server to respond with something that collides with the transcript you present to the client, etc. (To get downgrade protections across certificates, we would need something like the weaker certificate including a non-critical extension that says "this other certificate also exists, so if you see this extension and understand this OID, reject this". That would certainly be an interesting thing to design, but orthogonal to all this work.) As an aside: this also isn't specific to the specific retry mechanic. The DNS flow with today's broadly untrusted DNS has all the same properties. The retry flow is ultimately just making the client act as if it saw that particular DNS response. > 5) Client filtering of IDs: > > Intermediate elision presumably requires including Trust Anchor IDs > for intermediates, and that presents some special considerations. > > I presume that intermediate IDs should not be included without having > an root ID the intermediate chains to. > > And where CA used is known to have unpredictable/backup issuers, or > rotate issuers, it could be useful to include related issuers even > if the server does not signal those (it is just a few dozen bytes at > most, I think the 5-set Let's Encrypt uses is exceptionally large). > I think this is already mostly covered by this text: > If doing so, the client MAY send a subset of this intersection to meet size constraints, but SHOULD offer multiple options. This reduces the chance of a reconnection if, for example, the first option in the intersection uses a signature algorithm that the client doesn't support, or if the TLS server and DNS configuration are out of sync. https://www.ietf.org/archive/id/draft-beck-tls-trust-anchor-ids-03.html#name-client-behavior Though certainly there is plenty of room to iterate with the working group on the right things for the client to offer. I personally favor designs where the client logic doesn't need to know the relationship between intermediate and root IDs (PKIs can have sprawling structures, so this may not be well-defined). I think that's not necessary to get most of what you describe. Since the server already will have long and short chains provisioned, the DNS record will contain both intermediate and root IDs. The client will then see both when filtering, so I should "SHOULD offer multiple" already captures it. We won't get the "related issuers" case (could be interesting to add), but the root will at least match the long chain. > Then some ideas: > > - If Trust Anchor IDs fails to negotiate a certificate, it should just > be ignored. That is, fall back to whatever processing would happen > without the extension. > > Server should not explicitly fail unless it requires trust anchor > negotiation, and is just plain incompatible with clients that do not > support it. > Yup! That is very much the intent here. It's phrased in a way to align with Section 4.4.2.2 of RFC 8446, though I certainly could have gotten it wrong! We got feedback that folks wanted it to coexist cleanly with certificate_authorities, so I tried to write the text fitting the current setup. But, as with anything else, the text can change to whatever the working group consensus is when adopted. > - Send all the issuer chains in every (extended) ACME response, or have > dedicated endpoint to download the issuer chain bundle per issuer. > > This avoids having ACME client combine multiple chains from > alternatives. Which is not as easy as it sounds because the EE > certificates can differ, which can lead to really bizarre edge cases. > > Having bundle of issuer chains also fits a lot better with the > configuration model of a lot of existing TLS server software. > The feedback we got from Aaron Gable (CC'd) was to not tie multiple issuances to a single ACME order (e.g. imagine if you need to renew one but not the other), which suggests, at least at the ACME protocol level, to go in the other direction. That's why we've kept the single-chain MIME type and then further pared the ACME text in this draft to just the shared-EE case. Definitely the more general cases also need to be solved, but given that, it seemed better to tackle that separately. Beyond that, what gets sent on the wire in ACME doesn't necessarily determine the interface between the ACME client and the TLS server software. If some single-file multi-chain bundle is more convenient for a lot of server operators, the ACME client can always stitch that together as it wishes. I think, for this draft, it makes sense to focus on what happens in the TLS protocol, rather than the ACME client <-> TLS server channel. That's typically what's in scope for a TLS document. After all, RFC 8446 already has a notion of certificate selection, across many axes, and is quite content doing so without saying anything at all about how to provision multiple certificates. Of course, above all, like everything else, if the WG feels a different ACME direction makes sense (probably in consultation with ACME experts---I'm certainly not one!), that can certainly change. Based on my current understanding of things, and feedback from ACME experts, I still think what's in draft-03 is right, but we'll see. Thanks again for the comments! David
_______________________________________________ TLS mailing list -- tls@ietf.org To unsubscribe send an email to tls-le...@ietf.org