[TLS] Re: [EXTERNAL] Re: DTLS 1.3 bis

2024-11-17 Thread Stephen Farrell


Hiya,

Given David's presentation and subsequent list discussion, it seems
extraordinarily clear that a bis document is needed here;-)

On 17/11/2024 12:54, David Benjamin wrote:

A thought: This is now a protocol change, but what if we defined a "oops"
extension that simply adds a dummy post-Finished handshake message that
protrudes into epoch 3? I.e., if negotiated, the client and server flights
actually look like this:


Another thought: it looks like at least some of these issues may be
coming up now because our formal analyses of (D)TLS mostly covered
the security of the protocol and not the correctness of the protocol.

If that is true, and if it turns out we need to change DTLS to handle
the issues found, then maybe it'd be worthwhile trying to see if we
can find some people to try do formal analyses of the protocol with a
view to proving things about correctness?

I'm not suggesting making this a requirement, btw, nor a thing to be
mandated via any fatty process. But it's interesting that the not-
quite-unwanted sibling of the IETF protocol that has had by far the
most investment in formal analyses shows such deficiencies.

Cheers,
S.




OpenPGP_signature.asc
Description: OpenPGP digital signature
___
TLS mailing list -- tls@ietf.org
To unsubscribe send an email to tls-le...@ietf.org


[TLS] Re: [EXTERNAL] Re: DTLS 1.3 bis

2024-11-17 Thread Ilari Liusvaara
On Wed, Nov 13, 2024 at 01:39:43PM -0500, David Benjamin wrote:
> 
> Not to say that every implementor would have noticed every issue (I'm sure
> I overlooked some issues too), but I think DTLS's biggest challenge has
> always been the relatively little attention it receives compared to TLS.

- When can the server drop epoch 2 (handshake) receive keys?

Suppose client 2nd flight makes it through, but the ACK is lost. This
causes the client to re-transmit the flight. The re-transmission happens
with epoch 2. So the server needs epoch 2 receive keys in order to ACK
the re-transmit. And this ACK could get lost as well.

So the server needs to keep epoch 2 receive keys until client considers
its 2nd flight complete. However, the server does not seem to have means
to determine when this has happened.

If the server did not send CertificateRequest, then NewSessionTicket is
unordered w.r.t. client 2nd flight. And even if the server did send CR,
then NST is not considered implicit ACK for client 2nd flight.

Is there some prohibition on client sending post-handshake messages
before considering handshake complete? If no, one can't use PS messges
as an indicator, and the client might not send PS messages anyway.


- Single epoch, multiple prepare for next epoch messages.

What does it mean for a single epoch to contain multiple messages that
prepare for the next epoch? Does that prepare one epoch or multiple
epochs? Doing multiple might cause issues with epoch reconstruction.

AFAICT, sending multiple KeyUpdates in one epoch is not forbidden (the
spec requires ACK, not actual epoch bump in between).


And in future extensions, there might be more message types that prepare
epoch bumps (e.g., some Extended Key Update messages). The interactions
between those (and regular KeyUpdate) might not be simple.

I think there should be requirement that in each epoch, there is at
most one message that prepares for the next epoch, and that all
application data epoch except the last have exactly one.

And with restriction that retransmissions must occur on the same epoch
(why is that there for post-handshake messages?), the message that
prepares for the next epoch must always be the last in its flight.


> This is exacerbated by the kinds of things we need attention on. While the
> security, cryptography, and handshake bits (this WG's forte), more-or-less
> carry over as-is, it picks up a whole mess of transport-related concerns
> that just don't apply to TLS. 

I remember that when developing HTTP/2, the HTTP WG people had a joint
session with Transport Area folks about transport aspects of the
protocol. HTTP/2 does not need to deal with loss or reordering.


> And then there's also a wide range of possible implementations, depending
> on the simplifying assumptions you make (e.g. refusing to have multiple
> outgoing post-handshake flights active at once). That, in turn, means that
> a reader might not have bothered thinking about the more complex case, if
> they didn't mean to implement it. (On my end, I don't expect we'll
> implement everything in here either!)

And even if some case is implemented, it might still be subtly broken
(or completely broken if it is not actually used).

E.g., Long time ago, I saw a TLS client with totally broken KeyUpdate
handling. And this was MTI feature.

Or subtle issues in ACK implementation exposing endpoint to DoS or
being abused as an amplifier.


> While I think this WG's analysis (formal and otherwises) are mostly on
> security properties, the issues I found are mostly making sure the protocol
> can make forward progress under packet loss/reordering. But also whether
> the text sufficiently defines the protocol at all. For example, it's quite
> common for DTLS implementations to take these simplifying assumptions, but
> all that actually needs to be written down as allowed behavior, because it
> means that, when we analyze the protocol, receivers must accommodate a
> sender that, say, artificially block sending one flight on the ACK of
> another one.

And then there is stuff that works only because senders are being
conservative, not because protocol requires it.

E.g., server that fragments from multiple messages at once (I don't see
anything prohibiting that), and client that does not implement full out-
of-order receive buffering (I don't see anything requiring that).
Guaranteed deadlock in handshake, even with zero packet loss or
reordering.


> The remedy for all that is, well, more eyes on it, which we get by having
> the WG take on a bis document. :-) Beyond that, whether we need
> implementers, formal analysis, or just people reading and reasoning through
> the draft, I think we just welcome anyone who is interested in doing that
> work and go forth. All three sources of feedback ultimately involve a human
> reading the document and trying to understand what it's trying to say
> anyway, which I think is the biggest gap here. Once we even know what our
> protocol is, if there are 

[TLS] Re: TLS 1.3, Raw Public Keys, and Misbinding Attacks

2024-11-17 Thread Achim Kraus

Hi Mohit,

> Coming back to this. I'd disagree with the assertion that when using the
> raw public key mode, the public key is the identity. We don't open a
> connection to a key - we open a connection to a domain name or to an IP
> address  unless of course we are a HIPster and use Host Identity
> Protocol (HIP) such that the key and the address is strongly intertwined.
>

I consider, that your statement applies for some use-case, and for
others not.

Especially for device communication, it is also common to use a rather
"private" deployment with ahead provisioned credentials (PSK, RPK).
The provisioning is frequently done "out-of-band" and the trust is
based on that procedure.
For the client-side I also can't see, that the certificate of the
client is related to a "domain-name", at least it's in my opinion
not a "public" domain-name.

With that, please keep RFC7250 "as it is" and if you really insist,
introduce a new certificate type, which then may be trimmed to the
use-case, you have in mind.

br
Achim




Am 18.11.24 um 07:25 schrieb Mohit Sethi:

Hi Hannes, all,

Coming back to this. I'd disagree with the assertion that when using the
raw public key mode, the public key is the identity. We don't open a
connection to a key - we open a connection to a domain name or to an IP
address  unless of course we are a HIPster and use Host Identity
Protocol (HIP) such that the key and the address is strongly intertwined.

John is right here, if we don't include the server identity (e.g.:
domain name) in the handshake or verify it separately, then misbinding
is possible. We modeled TLS RPK with Proverif and found that misbinding
is possible: https://arxiv.org/pdf/2411.09770. The model detects
misbinding in both cases: i) where the received public key is verified
via DANE, and ii) where the received public key is verified from a list
of pre-configured of keys.

In fact, the existence of misbinding of TLS RPK can easily tested in the
real-world with OpenSSL using the following command (version 3.2.0 and up):


openssl s_client -connect msguru.eu:25 -dane_tlsa_domain "msguru.eu"
-dane_tlsa_rrdata "3 1 1
F4D9CF3B4E251085A4F3193DAAF3A5141CD95C7109D33C971C3F8F7CEC48CD1B"
-starttls smtp -enable_server_rpk


The above command results in a successful TLS handshake as is evident
from the output:


Server-to-client raw public key negotiated
Server raw public key
  Public-Key: (2048 bit)
  Modulus:
  00:c8:eb:ec:64:97:5d:aa:b6:99:06:68:13:8d:76:
  ff:31:06:77:fa:30:d0:a8:91:8e:90:fa:d5:77:7d:
  ad:0c:a3:5d:20:23:ee:b9:c7:23:5e:e4:3f:60:cd:
  6e:e6:2d:84:16:8e:03:ab:5b:a9:b3:ce:38:16:2d:
  6b:82:8f:22:ab:2c:23:19:7d:30:57:95:10:80:fe:
  d4:50:e5:c5:e3:c0:78:dc:86:31:87:aa:46:c8:95:
  3f:4a:8c:eb:21:58:f3:3b:c4:c9:1d:a4:53:cc:0e:
  79:ae:3c:92:d3:ac:9f:6f:34:5d:b6:78:92:29:27:
  70:a7:14:4e:26:ed:76:aa:81:ea:27:79:37:68:3c:
  20:4e:11:8a:30:c3:ff:93:c9:ee:24:a4:29:2a:44:
  bf:40:c2:1e:bd:cb:f7:1d:c6:f2:81:16:14:73:a8:
  88:09:10:bc:95:56:62:17:8c:db:55:ce:14:b0:70:
  d0:69:54:84:20:5e:b7:35:74:91:8d:1c:c0:3d:95:
  be:41:c0:6e:d4:34:6c:eb:25:7d:fd:c9:45:9c:e6:
  e6:9e:07:dd:28:22:70:34:7d:80:8d:43:6f:26:88:
  80:81:8c:02:95:dc:6f:3e:8f:ee:c1:df:95:a0:b8:
  58:78:15:bf:47:67:c7:b4:07:22:3e:ca:04:5e:3f:
  01:f7
  Exponent: 65537 (0x10001)
---
SSL handshake has read 1066 bytes and written 444 bytes
Verification: OK
DANE TLSA 3 1 1 ...09d33c971c3f8f7cec48cd1b matched the peer raw
public key
---

However, there is no server msguru.eu listening on port 25. Instead you
are connected to Viktor's mail server at mx1.imrryr.org which supports
server authentication with RPKs and has a DANE record published:
https://www.nslookup.io/domains/_25._tcp.mx1.imrryr.org/dns-records/tlsa/. 
Thankfully, most ISPs block outbound port 25 and therefore Viktor's mail server 
is not suddenly going to see a massive spurt in traffic. The fact that someone 
can publish a different MX record as their own and that the SNI can be used to 
detect such situation was already pointed out by Viktor in his email: 
https://mailarchive.ietf.org/arch/msg/tls/ey_rNTC8Um1OMD5cxjkpZ1OyInQ/.

The lesson here is the same countermeasure for all misbinding attack -
be explicit about the identities and check them. We have created a pull
request for 8446bis adding a reference to misbinding attacks and
countermeasures when using RPK. The goal was to keep the text to a minimum:

https://github.com/tlswg/tls13-spec/pull/1366

Feel free to modify the pull request and use! We welcome any further
discussion.

PS: We have some other results we are working on and will be happy to
present them together at one of the upcoming IETF meetings (likely 123
in Madrid).

On 4/16/24 12:30, Tschofenig, Hannes wrote:


Hi John,

I missed this email exchange and I largely agree with what has been
said by others before.

I disagree with your conclusion since the “identity” in the raw public
key case is the public key.


[TLS] Re: TLS 1.3, Raw Public Keys, and Misbinding Attacks

2024-11-17 Thread Viktor Dukhovni
On Mon, Nov 18, 2024 at 08:25:12AM +0200, Mohit Sethi wrote:

> The model detects misbinding in both cases: i) where the received
> public key is verified via DANE, and ii) where the received public key
> is verified from a list of pre-configured keys.

If the preconfigured key is correctly bound to the intended server, it
is unclear what "rebinding" or other problem you have in mind.  As
for client certificates vs. client RPKs there's again no issue.

Client identifies supplied 3rd-party CAs have little value in most
cases, rather, in the rare case that client certificates are used at
all, the relying party typically also controls client cert issuance and
binding of public keys to names.  In such cases, one can dispense with
reliance on stale certificates and instead look up the public key in
the current name binding database, which should be more up to date.

No client identity other than the public key is necessary in such cases,
the public key is an index into a privately maintained ACL, and
3rd-party CAs are not trusted to assert client entitlment.

Yes, one can imagine scenarios where certificates are some sort of
"government-issued id" and the service provided is to a "legal person",
as identified by said government, rather than to a registered customer.
Such services that delegate user authentication to government-issued
ids in the form of certificates, can of course choose to not use RPKs
(which are typically not enabled by default anyway).

> In fact, the existence of misbinding of TLS RPK can easily tested in the
> real-world with OpenSSL using the following command (version 3.2.0 and up):
> 
> > openssl s_client -connect msguru.eu:25 -dane_tlsa_domain "msguru.eu"
> > -dane_tlsa_rrdata "3 1 1
> > F4D9CF3B4E251085A4F3193DAAF3A5141CD95C7109D33C971C3F8F7CEC48CD1B"
> > -starttls smtp -enable_server_rpk
> 
> The above command results in a successful TLS handshake as is evident from
> the output:
> [...]

> However, there is no server msguru.eu listening on port 25. Instead you are
> connected to Viktor's mail server at mx1.imrryr.org which supports server
> authentication with RPKs and has a DANE record published:
> https://www.nslookup.io/domains/_25._tcp.mx1.imrryr.org/dns-records/tlsa/.

See also second block of comments below.  Note that most SMTP deliveries
with STARTTLS are unauthenticated opportunistic TLS, so no RPK is
required to perform "misbinding", just point your MX record hostname, or
IP address of your MX host somewhere else, and you're set (to achieve
nothing in particular).

> Thankfully, most ISPs block outbound port 25 and therefore Viktor's mail
> server is not suddenly going to see a massive spurt in traffic.

There are plenty of connections trying in vain to brute force SASL
logins on ports 587 and 465.  And nothing would be gained by making
"cross origin" requests to my MX hosts that could be made directly
instead.

> The fact that someone can publish a different MX record as their own
> and that the SNI can be used to detect such situation was already
> pointed out by Viktor in his email:
> https://mailarchive.ietf.org/arch/msg/tls/ey_rNTC8Um1OMD5cxjkpZ1OyInQ/.

It seems you've not entirely understood that post, detecting unexpected
SNI is perhaps appopriate in HTTPS (though the "Host:" header would
perhaps be a more easily inspected signal).  In the case of SMTP there
is little reason to bother, because there are no cross-origin issues to
guard against, and MX records already support redirection, no
"rebinding" needed.  Quoting from that post:

Note, that, for example, with SMTP the simplest way to direct traffic to
someone else's MX host is to publish MX records for one's own domain
that specify that MX host.  So "misbinding" attacks are not
"interesting" in this context.  Furthermore, because there are no
"cross-origin" issues in SMTP, there is nothing to be gained by
misleading a client that it is connected to a service endpoint for which
one can control the expected public key binding, when in fact it is
connecting to a "victim" service endpoint.

And of course how clients learn the association between and endpoint,
and the expected raw public key is a rather separate matter from whether
public keys or certificates happen to be used.  The public key might
be pre-shared out of band over a pre-existing bilateral trusted channel
between client and server, and proof of possession could be part of
that exchange if desired and useful.

> The lesson here is the same countermeasure for all misbinding attack - be
> explicit about the identities and check them. We have created a pull request
> for 8446bis adding a reference to misbinding attacks and countermeasures
> when using RPK. The goal was to keep the text to a minimum:
> 
> https://github.com/tlswg/tls13-spec/pull/1366

The "lesson" has a specific scope.  There is no problem with RPKs in
SMTP, and TLS is not synonymous web browsing over HTTPS.  Not even all
HTTPS traffi

[TLS] Re: [EXTERNAL] Re: DTLS 1.3 bis

2024-11-17 Thread Ilari Liusvaara
On Sun, Nov 17, 2024 at 07:54:17AM -0500, David Benjamin wrote:
> On Sat, Nov 16, 2024 at 10:40 AM Ilari Liusvaara 
> wrote:
> 
> > On Wed, Nov 13, 2024 at 01:39:43PM -0500, David Benjamin wrote:
> 
> A thought: This is now a protocol change, but what if we defined a "oops"
> extension that simply adds a dummy post-Finished handshake message that
> protrudes into epoch 3? I.e., if negotiated, the client and server flights
> actually look like this:
> 
> CH -->
> <-- SH {EE..Finished} [Oops]
> {Finished} [Oops] -->
> <-- [ACK]
> 
> I think if you combine that with the "ACKing epoch 3 implicitly ACKs all of
> epoch 2" rule, this problem might be resolved? All retransmits by the
> client are now guaranteed to contain at least one byte of Oops, because a
> fully-acked Oops implies an acked Finished. That means the server need only
> retain epoch 3, because as long as it can ACK the Oops, the client will get
> the message.

I don't think retaining epoch 3 is improvement over retaining epoch 2.

However, I think that the requirement that all prior flights must be
complete before stepping epoch helps here: It allows the server to drop
epoch 2 upon decrypting epoch 4 record. Even without the extra message.


> > - Single epoch, multiple prepare for next epoch messages.
> >
> > What does it mean for a single epoch to contain multiple messages that
> > prepare for the next epoch? Does that prepare one epoch or multiple
> > epochs? Doing multiple might cause issues with epoch reconstruction.
> >
> > AFAICT, sending multiple KeyUpdates in one epoch is not forbidden (the
> > spec requires ACK, not actual epoch bump in between).
> >
> 
> I believe it's forbidden by this text. But I suspect this was on accident
> because it's not just to facilitate epoch reconstruction:
> 
> > In order to facilitate epoch reconstruction (Section 4.2.2),
> implementations MUST NOT send records with the new keys or send a new
> KeyUpdate until the previous KeyUpdate has been acknowledged (this avoids
> having too many epochs in active use).
> https://www.rfc-editor.org/rfc/rfc9147.html#section-8-1

I found that text, but I think it still allows peer to send KeyUpdate,
get ACK, and then send another KeyUpdate in the same epoch...

 
> In my attempt to fix the other KeyUpdate brokenness, I said that nothing
> may follow a KeyUpdate in that epoch, which I think captures this a bit
> more directly. I think that works? Except...
> https://www.rfc-editor.org/errata/eid8047

Yeah, that should do it (modulo other KeyUpdate-like messages).


> > And in future extensions, there might be more message types that prepare
> > epoch bumps (e.g., some Extended Key Update messages). The interactions
> > between those (and regular KeyUpdate) might not be simple.
> >
> > I think there should be requirement that in each epoch, there is at
> > most one message that prepares for the next epoch, and that all
> > application data epoch except the last have exactly one.
> >
> > And with restriction that retransmissions must occur on the same epoch
> > (why is that there for post-handshake messages?), the message that
> > prepares for the next epoch must always be the last in its flight.
> >
> 
> Extended Key Update is potentially extra fun because it's a multi-flight
> transaction. What happens if you start an EKU flow but then, partway
> through it, the peer sends a plain KeyUpdate? What if one side starts an
> EKU flow and, at the same time, the other side sends KeyUpdate with
> key_update_requested? Will the EKU-sending peer know not to confuse itself?
> Or maybe we can design EKU such that it still works out, because the next
> epoch hasn't been prepared yet? EKU doesn't exist yet, but something we'll
> have to reason through when we get there.

EKU flow is defined to block ordinary KeyUpdate. So ordinary KeyUpdate
partway through is not allowed. The crossed case will not trigger
reciprocal KeyUpdate (the EKU transaction will update keys).

However, that restriction might not be necessary. I can come up with
design that should work as long as there can not be multiple prepare
for next epoch in a single epoch, nor multi-flight deadlocks.

Basic idea is to have 2nd and 3rd flights prepare for epoch change
(update send keys in TLS), and sender of 2nd flight save the KEM shared
secret for processing the received 3rd flight. Then there is 4th message
for case where the peer lost the initiator election.




-Ilari

___
TLS mailing list -- tls@ietf.org
To unsubscribe send an email to tls-le...@ietf.org


[TLS] Re: [EXTERNAL] Re: DTLS 1.3 bis

2024-11-17 Thread David Benjamin
On Sun, Nov 17, 2024 at 12:05 PM Ilari Liusvaara 
wrote:

> On Sun, Nov 17, 2024 at 07:54:17AM -0500, David Benjamin wrote:
> > On Sat, Nov 16, 2024 at 10:40 AM Ilari Liusvaara <
> ilariliusva...@welho.com>
> > wrote:
> >
> > > On Wed, Nov 13, 2024 at 01:39:43PM -0500, David Benjamin wrote:
> >
> > A thought: This is now a protocol change, but what if we defined a "oops"
> > extension that simply adds a dummy post-Finished handshake message that
> > protrudes into epoch 3? I.e., if negotiated, the client and server
> flights
> > actually look like this:
> >
> > CH -->
> > <-- SH {EE..Finished} [Oops]
> > {Finished} [Oops] -->
> > <-- [ACK]
> >
> > I think if you combine that with the "ACKing epoch 3 implicitly ACKs all
> of
> > epoch 2" rule, this problem might be resolved? All retransmits by the
> > client are now guaranteed to contain at least one byte of Oops, because a
> > fully-acked Oops implies an acked Finished. That means the server need
> only
> > retain epoch 3, because as long as it can ACK the Oops, the client will
> get
> > the message.
>
> I don't think retaining epoch 3 is improvement over retaining epoch 2.
>

I was thinking that, until the server decrypts epoch 4, the server will
already naturally be retaining epoch 3 for application data anyway. And
once the server decrypts epoch 4, it knows the client has received the ACK
and it is safe to stop responding to those retransmits. I.e. we're not
going out of our way to retain epoch 3, just following the natural
progression of epochs. The KeyUpdate rule then generalizes: You retain
epoch N-1 until you receive epoch N. Once you receive epoch N, you can
freely drop N-1.


> However, I think that the requirement that all prior flights must be
> complete before stepping epoch helps here: It allows the server to drop
> epoch 2 upon decrypting epoch 4 record. Even without the extra message.
>

Having the protocol observe the KeyUpdate rule definitely helps, but a
connection may last quite a while before a KeyUpdate. (If there is one at
all; as you note, KeyUpdates aren't particularly well-exercised[*].) The
server needs to retain epoch 2 until it guesses that the ACK probably got
through. Or maybe it just gives up and special-cases and retains epoch 2
indefinitely until a KeyUpdate, I dunno. Seems kind of silly.

[*] Early on the days of TLS 1.3, we tried to make Chrome trigger a
KeyUpdate shortly after the handshake. We immediately hit compatibility
issues because some servers could not handle it. In hindsight, doing that
at the very start would have been prudent, before TLS 1.3 was deployed at
all, but sadly I don't have a time machine.


> > > - Single epoch, multiple prepare for next epoch messages.
> > >
> > > What does it mean for a single epoch to contain multiple messages that
> > > prepare for the next epoch? Does that prepare one epoch or multiple
> > > epochs? Doing multiple might cause issues with epoch reconstruction.
> > >
> > > AFAICT, sending multiple KeyUpdates in one epoch is not forbidden (the
> > > spec requires ACK, not actual epoch bump in between).
> > >
> >
> > I believe it's forbidden by this text. But I suspect this was on accident
> > because it's not just to facilitate epoch reconstruction:
> >
> > > In order to facilitate epoch reconstruction (Section 4.2.2),
> > implementations MUST NOT send records with the new keys or send a new
> > KeyUpdate until the previous KeyUpdate has been acknowledged (this avoids
> > having too many epochs in active use).
> > https://www.rfc-editor.org/rfc/rfc9147.html#section-8-1
>
> I found that text, but I think it still allows peer to send KeyUpdate,
> get ACK, and then send another KeyUpdate in the same epoch...
>
>
> > In my attempt to fix the other KeyUpdate brokenness, I said that nothing
> > may follow a KeyUpdate in that epoch, which I think captures this a bit
> > more directly. I think that works? Except...
> > https://www.rfc-editor.org/errata/eid8047
>
> Yeah, that should do it (modulo other KeyUpdate-like messages).
>
>
> > > And in future extensions, there might be more message types that
> prepare
> > > epoch bumps (e.g., some Extended Key Update messages). The interactions
> > > between those (and regular KeyUpdate) might not be simple.
> > >
> > > I think there should be requirement that in each epoch, there is at
> > > most one message that prepares for the next epoch, and that all
> > > application data epoch except the last have exactly one.
> > >
> > > And with restriction that retransmissions must occur on the same epoch
> > > (why is that there for post-handshake messages?), the message that
> > > prepares for the next epoch must always be the last in its flight.
> > >
> >
> > Extended Key Update is potentially extra fun because it's a multi-flight
> > transaction. What happens if you start an EKU flow but then, partway
> > through it, the peer sends a plain KeyUpdate? What if one side starts an
> > EKU flow and, at the same time, the other side sends KeyUpdate with

[TLS] Re: [EXTERNAL] Re: DTLS 1.3 bis

2024-11-17 Thread David Benjamin
On Sat, Nov 16, 2024 at 10:40 AM Ilari Liusvaara 
wrote:

> On Wed, Nov 13, 2024 at 01:39:43PM -0500, David Benjamin wrote:
> >
> > Not to say that every implementor would have noticed every issue (I'm
> sure
> > I overlooked some issues too), but I think DTLS's biggest challenge has
> > always been the relatively little attention it receives compared to TLS.
>
> - When can the server drop epoch 2 (handshake) receive keys?
>
> Suppose client 2nd flight makes it through, but the ACK is lost. This
> causes the client to re-transmit the flight. The re-transmission happens
> with epoch 2. So the server needs epoch 2 receive keys in order to ACK
> the re-transmit. And this ACK could get lost as well.
>
> So the server needs to keep epoch 2 receive keys until client considers
> its 2nd flight complete. However, the server does not seem to have means
> to determine when this has happened.
>
> If the server did not send CertificateRequest, then NewSessionTicket is
> unordered w.r.t. client 2nd flight. And even if the server did send CR,
> then NST is not considered implicit ACK for client 2nd flight.
>
> Is there some prohibition on client sending post-handshake messages
> before considering handshake complete? If no, one can't use PS messges
> as an indicator, and the client might not send PS messages anyway.
>

Aww, yuck! Well, that proves my parenthetical. I'd missed that one.

I mean, the spec does have an answer, but it's incredibly unsatisfying,
because it's based on time rather than packet loss.

> In addition, for at least twice the default MSL defined for [RFC0793],
when in the FINISHED state, the server MUST respond to retransmission of
the client's final flight with a retransmit of its ACK.
https://www.rfc-editor.org/rfc/rfc9147.html#section-5.8.1-9

In particular, this means my "best guess" in slide 8 here is not sufficient
and you actually are *required* to carry a past read epoch, in just one
case:
https://datatracker.ietf.org/meeting/121/materials/slides-121-tls-13-dtls-13-details-00

That means this text here is wrong, because it suggests this is optional:
> Implementations SHOULD discard records from earlier epochs but MAY choose
to retain keying material from previous epochs for up to the default MSL
specified for TCP [RFC0793] to allow for packet reordering.
https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1-1

Another problem with it being based on time is that we might have moved
arbitrarily far in the connection by then. Maybe the RTT is suuuper fast
and actually we've done KeyUpdate 100x by then. By a strict reading of that
text, you're still obligated to retain epoch 2, even though you're on epoch
100. Now you need to retain epochs arbitrarily far apart!

Fortunately, this is actually impossible because if the server ACKs
KeyUpdate, the client should know the server has received its final flight.
But there is nothing in the spec that says "if you receive an ACK for a
message in epoch N, everything epochs < N ACKed". (Note that < here is
evaluated according to our partially-ordered set because epoch 1 is weird.
Though epoch 1 should contain no handshake messages, so it's kinda moot.)
Moreover, if we apply the fix to KeyUpdate, the client will not start
sending at the new epoch until the final flight is ACKed too and everything
is caught up. But, impossible as all this is, the spec text does not
account for it being impossible.

We also have a near miss for even more complexity. Suppose the handshake's
final flight actually spanned both epochs 0 and 2 instead of just 2. The
server receives both but, for whatever reason, the ACK for epoch 0 didn't
get through. (Re-ACKing past records is optional in the spec. Also things
might fall out of bounded ACK buffers eventually.) The client will not
consider the final flight to be ACKed until *all* records are through,
which means the server would need to retain epoch 0. Fortunately, the final
flight doesn't look like this and we don't need to worry about it. Though
it's further evidence that we should add the implicit ACK condition above.

There's a related, but less crucial, problem with the server's final
flight. At what point can the client discard read epoch 2? Consider:

CH -->
<-- SH {EE..Finished}
<-- [0.5-RTT App Data]
{Finished} -/-> (lost)

Now, the client will retransmit Finished and eventually repair this, but
the server has a retransmit timer too. Since the server can't tell which
side was lost, it will retransmit SH {EE..Finished}. The client is expected
to use that to drive retransmitting Finished:

> 3. The implementation reads a retransmitted flight from the peer when
none of the messages that it sent in response to that flight have been
acknowledged: the implementation transitions to the SENDING state, where it
retransmits the flight, adjusts and re-arms the retransmit timer, and
returns to the WAITING state. The rationale here is that the receipt of a
duplicate message is the likely result of timer expiry on the peer and
there