[TLS] Re: [EXTERNAL] Re: DTLS 1.3 bis
Hiya, Given David's presentation and subsequent list discussion, it seems extraordinarily clear that a bis document is needed here;-) On 17/11/2024 12:54, David Benjamin wrote: A thought: This is now a protocol change, but what if we defined a "oops" extension that simply adds a dummy post-Finished handshake message that protrudes into epoch 3? I.e., if negotiated, the client and server flights actually look like this: Another thought: it looks like at least some of these issues may be coming up now because our formal analyses of (D)TLS mostly covered the security of the protocol and not the correctness of the protocol. If that is true, and if it turns out we need to change DTLS to handle the issues found, then maybe it'd be worthwhile trying to see if we can find some people to try do formal analyses of the protocol with a view to proving things about correctness? I'm not suggesting making this a requirement, btw, nor a thing to be mandated via any fatty process. But it's interesting that the not- quite-unwanted sibling of the IETF protocol that has had by far the most investment in formal analyses shows such deficiencies. Cheers, S. OpenPGP_signature.asc Description: OpenPGP digital signature ___ TLS mailing list -- tls@ietf.org To unsubscribe send an email to tls-le...@ietf.org
[TLS] Re: [EXTERNAL] Re: DTLS 1.3 bis
On Wed, Nov 13, 2024 at 01:39:43PM -0500, David Benjamin wrote: > > Not to say that every implementor would have noticed every issue (I'm sure > I overlooked some issues too), but I think DTLS's biggest challenge has > always been the relatively little attention it receives compared to TLS. - When can the server drop epoch 2 (handshake) receive keys? Suppose client 2nd flight makes it through, but the ACK is lost. This causes the client to re-transmit the flight. The re-transmission happens with epoch 2. So the server needs epoch 2 receive keys in order to ACK the re-transmit. And this ACK could get lost as well. So the server needs to keep epoch 2 receive keys until client considers its 2nd flight complete. However, the server does not seem to have means to determine when this has happened. If the server did not send CertificateRequest, then NewSessionTicket is unordered w.r.t. client 2nd flight. And even if the server did send CR, then NST is not considered implicit ACK for client 2nd flight. Is there some prohibition on client sending post-handshake messages before considering handshake complete? If no, one can't use PS messges as an indicator, and the client might not send PS messages anyway. - Single epoch, multiple prepare for next epoch messages. What does it mean for a single epoch to contain multiple messages that prepare for the next epoch? Does that prepare one epoch or multiple epochs? Doing multiple might cause issues with epoch reconstruction. AFAICT, sending multiple KeyUpdates in one epoch is not forbidden (the spec requires ACK, not actual epoch bump in between). And in future extensions, there might be more message types that prepare epoch bumps (e.g., some Extended Key Update messages). The interactions between those (and regular KeyUpdate) might not be simple. I think there should be requirement that in each epoch, there is at most one message that prepares for the next epoch, and that all application data epoch except the last have exactly one. And with restriction that retransmissions must occur on the same epoch (why is that there for post-handshake messages?), the message that prepares for the next epoch must always be the last in its flight. > This is exacerbated by the kinds of things we need attention on. While the > security, cryptography, and handshake bits (this WG's forte), more-or-less > carry over as-is, it picks up a whole mess of transport-related concerns > that just don't apply to TLS. I remember that when developing HTTP/2, the HTTP WG people had a joint session with Transport Area folks about transport aspects of the protocol. HTTP/2 does not need to deal with loss or reordering. > And then there's also a wide range of possible implementations, depending > on the simplifying assumptions you make (e.g. refusing to have multiple > outgoing post-handshake flights active at once). That, in turn, means that > a reader might not have bothered thinking about the more complex case, if > they didn't mean to implement it. (On my end, I don't expect we'll > implement everything in here either!) And even if some case is implemented, it might still be subtly broken (or completely broken if it is not actually used). E.g., Long time ago, I saw a TLS client with totally broken KeyUpdate handling. And this was MTI feature. Or subtle issues in ACK implementation exposing endpoint to DoS or being abused as an amplifier. > While I think this WG's analysis (formal and otherwises) are mostly on > security properties, the issues I found are mostly making sure the protocol > can make forward progress under packet loss/reordering. But also whether > the text sufficiently defines the protocol at all. For example, it's quite > common for DTLS implementations to take these simplifying assumptions, but > all that actually needs to be written down as allowed behavior, because it > means that, when we analyze the protocol, receivers must accommodate a > sender that, say, artificially block sending one flight on the ACK of > another one. And then there is stuff that works only because senders are being conservative, not because protocol requires it. E.g., server that fragments from multiple messages at once (I don't see anything prohibiting that), and client that does not implement full out- of-order receive buffering (I don't see anything requiring that). Guaranteed deadlock in handshake, even with zero packet loss or reordering. > The remedy for all that is, well, more eyes on it, which we get by having > the WG take on a bis document. :-) Beyond that, whether we need > implementers, formal analysis, or just people reading and reasoning through > the draft, I think we just welcome anyone who is interested in doing that > work and go forth. All three sources of feedback ultimately involve a human > reading the document and trying to understand what it's trying to say > anyway, which I think is the biggest gap here. Once we even know what our > protocol is, if there are
[TLS] Re: TLS 1.3, Raw Public Keys, and Misbinding Attacks
Hi Mohit, > Coming back to this. I'd disagree with the assertion that when using the > raw public key mode, the public key is the identity. We don't open a > connection to a key - we open a connection to a domain name or to an IP > address unless of course we are a HIPster and use Host Identity > Protocol (HIP) such that the key and the address is strongly intertwined. > I consider, that your statement applies for some use-case, and for others not. Especially for device communication, it is also common to use a rather "private" deployment with ahead provisioned credentials (PSK, RPK). The provisioning is frequently done "out-of-band" and the trust is based on that procedure. For the client-side I also can't see, that the certificate of the client is related to a "domain-name", at least it's in my opinion not a "public" domain-name. With that, please keep RFC7250 "as it is" and if you really insist, introduce a new certificate type, which then may be trimmed to the use-case, you have in mind. br Achim Am 18.11.24 um 07:25 schrieb Mohit Sethi: Hi Hannes, all, Coming back to this. I'd disagree with the assertion that when using the raw public key mode, the public key is the identity. We don't open a connection to a key - we open a connection to a domain name or to an IP address unless of course we are a HIPster and use Host Identity Protocol (HIP) such that the key and the address is strongly intertwined. John is right here, if we don't include the server identity (e.g.: domain name) in the handshake or verify it separately, then misbinding is possible. We modeled TLS RPK with Proverif and found that misbinding is possible: https://arxiv.org/pdf/2411.09770. The model detects misbinding in both cases: i) where the received public key is verified via DANE, and ii) where the received public key is verified from a list of pre-configured of keys. In fact, the existence of misbinding of TLS RPK can easily tested in the real-world with OpenSSL using the following command (version 3.2.0 and up): openssl s_client -connect msguru.eu:25 -dane_tlsa_domain "msguru.eu" -dane_tlsa_rrdata "3 1 1 F4D9CF3B4E251085A4F3193DAAF3A5141CD95C7109D33C971C3F8F7CEC48CD1B" -starttls smtp -enable_server_rpk The above command results in a successful TLS handshake as is evident from the output: Server-to-client raw public key negotiated Server raw public key Public-Key: (2048 bit) Modulus: 00:c8:eb:ec:64:97:5d:aa:b6:99:06:68:13:8d:76: ff:31:06:77:fa:30:d0:a8:91:8e:90:fa:d5:77:7d: ad:0c:a3:5d:20:23:ee:b9:c7:23:5e:e4:3f:60:cd: 6e:e6:2d:84:16:8e:03:ab:5b:a9:b3:ce:38:16:2d: 6b:82:8f:22:ab:2c:23:19:7d:30:57:95:10:80:fe: d4:50:e5:c5:e3:c0:78:dc:86:31:87:aa:46:c8:95: 3f:4a:8c:eb:21:58:f3:3b:c4:c9:1d:a4:53:cc:0e: 79:ae:3c:92:d3:ac:9f:6f:34:5d:b6:78:92:29:27: 70:a7:14:4e:26:ed:76:aa:81:ea:27:79:37:68:3c: 20:4e:11:8a:30:c3:ff:93:c9:ee:24:a4:29:2a:44: bf:40:c2:1e:bd:cb:f7:1d:c6:f2:81:16:14:73:a8: 88:09:10:bc:95:56:62:17:8c:db:55:ce:14:b0:70: d0:69:54:84:20:5e:b7:35:74:91:8d:1c:c0:3d:95: be:41:c0:6e:d4:34:6c:eb:25:7d:fd:c9:45:9c:e6: e6:9e:07:dd:28:22:70:34:7d:80:8d:43:6f:26:88: 80:81:8c:02:95:dc:6f:3e:8f:ee:c1:df:95:a0:b8: 58:78:15:bf:47:67:c7:b4:07:22:3e:ca:04:5e:3f: 01:f7 Exponent: 65537 (0x10001) --- SSL handshake has read 1066 bytes and written 444 bytes Verification: OK DANE TLSA 3 1 1 ...09d33c971c3f8f7cec48cd1b matched the peer raw public key --- However, there is no server msguru.eu listening on port 25. Instead you are connected to Viktor's mail server at mx1.imrryr.org which supports server authentication with RPKs and has a DANE record published: https://www.nslookup.io/domains/_25._tcp.mx1.imrryr.org/dns-records/tlsa/. Thankfully, most ISPs block outbound port 25 and therefore Viktor's mail server is not suddenly going to see a massive spurt in traffic. The fact that someone can publish a different MX record as their own and that the SNI can be used to detect such situation was already pointed out by Viktor in his email: https://mailarchive.ietf.org/arch/msg/tls/ey_rNTC8Um1OMD5cxjkpZ1OyInQ/. The lesson here is the same countermeasure for all misbinding attack - be explicit about the identities and check them. We have created a pull request for 8446bis adding a reference to misbinding attacks and countermeasures when using RPK. The goal was to keep the text to a minimum: https://github.com/tlswg/tls13-spec/pull/1366 Feel free to modify the pull request and use! We welcome any further discussion. PS: We have some other results we are working on and will be happy to present them together at one of the upcoming IETF meetings (likely 123 in Madrid). On 4/16/24 12:30, Tschofenig, Hannes wrote: Hi John, I missed this email exchange and I largely agree with what has been said by others before. I disagree with your conclusion since the “identity” in the raw public key case is the public key.
[TLS] Re: TLS 1.3, Raw Public Keys, and Misbinding Attacks
On Mon, Nov 18, 2024 at 08:25:12AM +0200, Mohit Sethi wrote: > The model detects misbinding in both cases: i) where the received > public key is verified via DANE, and ii) where the received public key > is verified from a list of pre-configured keys. If the preconfigured key is correctly bound to the intended server, it is unclear what "rebinding" or other problem you have in mind. As for client certificates vs. client RPKs there's again no issue. Client identifies supplied 3rd-party CAs have little value in most cases, rather, in the rare case that client certificates are used at all, the relying party typically also controls client cert issuance and binding of public keys to names. In such cases, one can dispense with reliance on stale certificates and instead look up the public key in the current name binding database, which should be more up to date. No client identity other than the public key is necessary in such cases, the public key is an index into a privately maintained ACL, and 3rd-party CAs are not trusted to assert client entitlment. Yes, one can imagine scenarios where certificates are some sort of "government-issued id" and the service provided is to a "legal person", as identified by said government, rather than to a registered customer. Such services that delegate user authentication to government-issued ids in the form of certificates, can of course choose to not use RPKs (which are typically not enabled by default anyway). > In fact, the existence of misbinding of TLS RPK can easily tested in the > real-world with OpenSSL using the following command (version 3.2.0 and up): > > > openssl s_client -connect msguru.eu:25 -dane_tlsa_domain "msguru.eu" > > -dane_tlsa_rrdata "3 1 1 > > F4D9CF3B4E251085A4F3193DAAF3A5141CD95C7109D33C971C3F8F7CEC48CD1B" > > -starttls smtp -enable_server_rpk > > The above command results in a successful TLS handshake as is evident from > the output: > [...] > However, there is no server msguru.eu listening on port 25. Instead you are > connected to Viktor's mail server at mx1.imrryr.org which supports server > authentication with RPKs and has a DANE record published: > https://www.nslookup.io/domains/_25._tcp.mx1.imrryr.org/dns-records/tlsa/. See also second block of comments below. Note that most SMTP deliveries with STARTTLS are unauthenticated opportunistic TLS, so no RPK is required to perform "misbinding", just point your MX record hostname, or IP address of your MX host somewhere else, and you're set (to achieve nothing in particular). > Thankfully, most ISPs block outbound port 25 and therefore Viktor's mail > server is not suddenly going to see a massive spurt in traffic. There are plenty of connections trying in vain to brute force SASL logins on ports 587 and 465. And nothing would be gained by making "cross origin" requests to my MX hosts that could be made directly instead. > The fact that someone can publish a different MX record as their own > and that the SNI can be used to detect such situation was already > pointed out by Viktor in his email: > https://mailarchive.ietf.org/arch/msg/tls/ey_rNTC8Um1OMD5cxjkpZ1OyInQ/. It seems you've not entirely understood that post, detecting unexpected SNI is perhaps appopriate in HTTPS (though the "Host:" header would perhaps be a more easily inspected signal). In the case of SMTP there is little reason to bother, because there are no cross-origin issues to guard against, and MX records already support redirection, no "rebinding" needed. Quoting from that post: Note, that, for example, with SMTP the simplest way to direct traffic to someone else's MX host is to publish MX records for one's own domain that specify that MX host. So "misbinding" attacks are not "interesting" in this context. Furthermore, because there are no "cross-origin" issues in SMTP, there is nothing to be gained by misleading a client that it is connected to a service endpoint for which one can control the expected public key binding, when in fact it is connecting to a "victim" service endpoint. And of course how clients learn the association between and endpoint, and the expected raw public key is a rather separate matter from whether public keys or certificates happen to be used. The public key might be pre-shared out of band over a pre-existing bilateral trusted channel between client and server, and proof of possession could be part of that exchange if desired and useful. > The lesson here is the same countermeasure for all misbinding attack - be > explicit about the identities and check them. We have created a pull request > for 8446bis adding a reference to misbinding attacks and countermeasures > when using RPK. The goal was to keep the text to a minimum: > > https://github.com/tlswg/tls13-spec/pull/1366 The "lesson" has a specific scope. There is no problem with RPKs in SMTP, and TLS is not synonymous web browsing over HTTPS. Not even all HTTPS traffi
[TLS] Re: [EXTERNAL] Re: DTLS 1.3 bis
On Sun, Nov 17, 2024 at 07:54:17AM -0500, David Benjamin wrote: > On Sat, Nov 16, 2024 at 10:40 AM Ilari Liusvaara > wrote: > > > On Wed, Nov 13, 2024 at 01:39:43PM -0500, David Benjamin wrote: > > A thought: This is now a protocol change, but what if we defined a "oops" > extension that simply adds a dummy post-Finished handshake message that > protrudes into epoch 3? I.e., if negotiated, the client and server flights > actually look like this: > > CH --> > <-- SH {EE..Finished} [Oops] > {Finished} [Oops] --> > <-- [ACK] > > I think if you combine that with the "ACKing epoch 3 implicitly ACKs all of > epoch 2" rule, this problem might be resolved? All retransmits by the > client are now guaranteed to contain at least one byte of Oops, because a > fully-acked Oops implies an acked Finished. That means the server need only > retain epoch 3, because as long as it can ACK the Oops, the client will get > the message. I don't think retaining epoch 3 is improvement over retaining epoch 2. However, I think that the requirement that all prior flights must be complete before stepping epoch helps here: It allows the server to drop epoch 2 upon decrypting epoch 4 record. Even without the extra message. > > - Single epoch, multiple prepare for next epoch messages. > > > > What does it mean for a single epoch to contain multiple messages that > > prepare for the next epoch? Does that prepare one epoch or multiple > > epochs? Doing multiple might cause issues with epoch reconstruction. > > > > AFAICT, sending multiple KeyUpdates in one epoch is not forbidden (the > > spec requires ACK, not actual epoch bump in between). > > > > I believe it's forbidden by this text. But I suspect this was on accident > because it's not just to facilitate epoch reconstruction: > > > In order to facilitate epoch reconstruction (Section 4.2.2), > implementations MUST NOT send records with the new keys or send a new > KeyUpdate until the previous KeyUpdate has been acknowledged (this avoids > having too many epochs in active use). > https://www.rfc-editor.org/rfc/rfc9147.html#section-8-1 I found that text, but I think it still allows peer to send KeyUpdate, get ACK, and then send another KeyUpdate in the same epoch... > In my attempt to fix the other KeyUpdate brokenness, I said that nothing > may follow a KeyUpdate in that epoch, which I think captures this a bit > more directly. I think that works? Except... > https://www.rfc-editor.org/errata/eid8047 Yeah, that should do it (modulo other KeyUpdate-like messages). > > And in future extensions, there might be more message types that prepare > > epoch bumps (e.g., some Extended Key Update messages). The interactions > > between those (and regular KeyUpdate) might not be simple. > > > > I think there should be requirement that in each epoch, there is at > > most one message that prepares for the next epoch, and that all > > application data epoch except the last have exactly one. > > > > And with restriction that retransmissions must occur on the same epoch > > (why is that there for post-handshake messages?), the message that > > prepares for the next epoch must always be the last in its flight. > > > > Extended Key Update is potentially extra fun because it's a multi-flight > transaction. What happens if you start an EKU flow but then, partway > through it, the peer sends a plain KeyUpdate? What if one side starts an > EKU flow and, at the same time, the other side sends KeyUpdate with > key_update_requested? Will the EKU-sending peer know not to confuse itself? > Or maybe we can design EKU such that it still works out, because the next > epoch hasn't been prepared yet? EKU doesn't exist yet, but something we'll > have to reason through when we get there. EKU flow is defined to block ordinary KeyUpdate. So ordinary KeyUpdate partway through is not allowed. The crossed case will not trigger reciprocal KeyUpdate (the EKU transaction will update keys). However, that restriction might not be necessary. I can come up with design that should work as long as there can not be multiple prepare for next epoch in a single epoch, nor multi-flight deadlocks. Basic idea is to have 2nd and 3rd flights prepare for epoch change (update send keys in TLS), and sender of 2nd flight save the KEM shared secret for processing the received 3rd flight. Then there is 4th message for case where the peer lost the initiator election. -Ilari ___ TLS mailing list -- tls@ietf.org To unsubscribe send an email to tls-le...@ietf.org
[TLS] Re: [EXTERNAL] Re: DTLS 1.3 bis
On Sun, Nov 17, 2024 at 12:05 PM Ilari Liusvaara wrote: > On Sun, Nov 17, 2024 at 07:54:17AM -0500, David Benjamin wrote: > > On Sat, Nov 16, 2024 at 10:40 AM Ilari Liusvaara < > ilariliusva...@welho.com> > > wrote: > > > > > On Wed, Nov 13, 2024 at 01:39:43PM -0500, David Benjamin wrote: > > > > A thought: This is now a protocol change, but what if we defined a "oops" > > extension that simply adds a dummy post-Finished handshake message that > > protrudes into epoch 3? I.e., if negotiated, the client and server > flights > > actually look like this: > > > > CH --> > > <-- SH {EE..Finished} [Oops] > > {Finished} [Oops] --> > > <-- [ACK] > > > > I think if you combine that with the "ACKing epoch 3 implicitly ACKs all > of > > epoch 2" rule, this problem might be resolved? All retransmits by the > > client are now guaranteed to contain at least one byte of Oops, because a > > fully-acked Oops implies an acked Finished. That means the server need > only > > retain epoch 3, because as long as it can ACK the Oops, the client will > get > > the message. > > I don't think retaining epoch 3 is improvement over retaining epoch 2. > I was thinking that, until the server decrypts epoch 4, the server will already naturally be retaining epoch 3 for application data anyway. And once the server decrypts epoch 4, it knows the client has received the ACK and it is safe to stop responding to those retransmits. I.e. we're not going out of our way to retain epoch 3, just following the natural progression of epochs. The KeyUpdate rule then generalizes: You retain epoch N-1 until you receive epoch N. Once you receive epoch N, you can freely drop N-1. > However, I think that the requirement that all prior flights must be > complete before stepping epoch helps here: It allows the server to drop > epoch 2 upon decrypting epoch 4 record. Even without the extra message. > Having the protocol observe the KeyUpdate rule definitely helps, but a connection may last quite a while before a KeyUpdate. (If there is one at all; as you note, KeyUpdates aren't particularly well-exercised[*].) The server needs to retain epoch 2 until it guesses that the ACK probably got through. Or maybe it just gives up and special-cases and retains epoch 2 indefinitely until a KeyUpdate, I dunno. Seems kind of silly. [*] Early on the days of TLS 1.3, we tried to make Chrome trigger a KeyUpdate shortly after the handshake. We immediately hit compatibility issues because some servers could not handle it. In hindsight, doing that at the very start would have been prudent, before TLS 1.3 was deployed at all, but sadly I don't have a time machine. > > > - Single epoch, multiple prepare for next epoch messages. > > > > > > What does it mean for a single epoch to contain multiple messages that > > > prepare for the next epoch? Does that prepare one epoch or multiple > > > epochs? Doing multiple might cause issues with epoch reconstruction. > > > > > > AFAICT, sending multiple KeyUpdates in one epoch is not forbidden (the > > > spec requires ACK, not actual epoch bump in between). > > > > > > > I believe it's forbidden by this text. But I suspect this was on accident > > because it's not just to facilitate epoch reconstruction: > > > > > In order to facilitate epoch reconstruction (Section 4.2.2), > > implementations MUST NOT send records with the new keys or send a new > > KeyUpdate until the previous KeyUpdate has been acknowledged (this avoids > > having too many epochs in active use). > > https://www.rfc-editor.org/rfc/rfc9147.html#section-8-1 > > I found that text, but I think it still allows peer to send KeyUpdate, > get ACK, and then send another KeyUpdate in the same epoch... > > > > In my attempt to fix the other KeyUpdate brokenness, I said that nothing > > may follow a KeyUpdate in that epoch, which I think captures this a bit > > more directly. I think that works? Except... > > https://www.rfc-editor.org/errata/eid8047 > > Yeah, that should do it (modulo other KeyUpdate-like messages). > > > > > And in future extensions, there might be more message types that > prepare > > > epoch bumps (e.g., some Extended Key Update messages). The interactions > > > between those (and regular KeyUpdate) might not be simple. > > > > > > I think there should be requirement that in each epoch, there is at > > > most one message that prepares for the next epoch, and that all > > > application data epoch except the last have exactly one. > > > > > > And with restriction that retransmissions must occur on the same epoch > > > (why is that there for post-handshake messages?), the message that > > > prepares for the next epoch must always be the last in its flight. > > > > > > > Extended Key Update is potentially extra fun because it's a multi-flight > > transaction. What happens if you start an EKU flow but then, partway > > through it, the peer sends a plain KeyUpdate? What if one side starts an > > EKU flow and, at the same time, the other side sends KeyUpdate with
[TLS] Re: [EXTERNAL] Re: DTLS 1.3 bis
On Sat, Nov 16, 2024 at 10:40 AM Ilari Liusvaara wrote: > On Wed, Nov 13, 2024 at 01:39:43PM -0500, David Benjamin wrote: > > > > Not to say that every implementor would have noticed every issue (I'm > sure > > I overlooked some issues too), but I think DTLS's biggest challenge has > > always been the relatively little attention it receives compared to TLS. > > - When can the server drop epoch 2 (handshake) receive keys? > > Suppose client 2nd flight makes it through, but the ACK is lost. This > causes the client to re-transmit the flight. The re-transmission happens > with epoch 2. So the server needs epoch 2 receive keys in order to ACK > the re-transmit. And this ACK could get lost as well. > > So the server needs to keep epoch 2 receive keys until client considers > its 2nd flight complete. However, the server does not seem to have means > to determine when this has happened. > > If the server did not send CertificateRequest, then NewSessionTicket is > unordered w.r.t. client 2nd flight. And even if the server did send CR, > then NST is not considered implicit ACK for client 2nd flight. > > Is there some prohibition on client sending post-handshake messages > before considering handshake complete? If no, one can't use PS messges > as an indicator, and the client might not send PS messages anyway. > Aww, yuck! Well, that proves my parenthetical. I'd missed that one. I mean, the spec does have an answer, but it's incredibly unsatisfying, because it's based on time rather than packet loss. > In addition, for at least twice the default MSL defined for [RFC0793], when in the FINISHED state, the server MUST respond to retransmission of the client's final flight with a retransmit of its ACK. https://www.rfc-editor.org/rfc/rfc9147.html#section-5.8.1-9 In particular, this means my "best guess" in slide 8 here is not sufficient and you actually are *required* to carry a past read epoch, in just one case: https://datatracker.ietf.org/meeting/121/materials/slides-121-tls-13-dtls-13-details-00 That means this text here is wrong, because it suggests this is optional: > Implementations SHOULD discard records from earlier epochs but MAY choose to retain keying material from previous epochs for up to the default MSL specified for TCP [RFC0793] to allow for packet reordering. https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1-1 Another problem with it being based on time is that we might have moved arbitrarily far in the connection by then. Maybe the RTT is suuuper fast and actually we've done KeyUpdate 100x by then. By a strict reading of that text, you're still obligated to retain epoch 2, even though you're on epoch 100. Now you need to retain epochs arbitrarily far apart! Fortunately, this is actually impossible because if the server ACKs KeyUpdate, the client should know the server has received its final flight. But there is nothing in the spec that says "if you receive an ACK for a message in epoch N, everything epochs < N ACKed". (Note that < here is evaluated according to our partially-ordered set because epoch 1 is weird. Though epoch 1 should contain no handshake messages, so it's kinda moot.) Moreover, if we apply the fix to KeyUpdate, the client will not start sending at the new epoch until the final flight is ACKed too and everything is caught up. But, impossible as all this is, the spec text does not account for it being impossible. We also have a near miss for even more complexity. Suppose the handshake's final flight actually spanned both epochs 0 and 2 instead of just 2. The server receives both but, for whatever reason, the ACK for epoch 0 didn't get through. (Re-ACKing past records is optional in the spec. Also things might fall out of bounded ACK buffers eventually.) The client will not consider the final flight to be ACKed until *all* records are through, which means the server would need to retain epoch 0. Fortunately, the final flight doesn't look like this and we don't need to worry about it. Though it's further evidence that we should add the implicit ACK condition above. There's a related, but less crucial, problem with the server's final flight. At what point can the client discard read epoch 2? Consider: CH --> <-- SH {EE..Finished} <-- [0.5-RTT App Data] {Finished} -/-> (lost) Now, the client will retransmit Finished and eventually repair this, but the server has a retransmit timer too. Since the server can't tell which side was lost, it will retransmit SH {EE..Finished}. The client is expected to use that to drive retransmitting Finished: > 3. The implementation reads a retransmitted flight from the peer when none of the messages that it sent in response to that flight have been acknowledged: the implementation transitions to the SENDING state, where it retransmits the flight, adjusts and re-arms the retransmit timer, and returns to the WAITING state. The rationale here is that the receipt of a duplicate message is the likely result of timer expiry on the peer and there