[TLS] Re: DTLS 1.3 ACKs near the version transition

2024-09-19 Thread David Benjamin
Ah fun, another issue in this document. So not only are write epoch
lifetimes unspecified and complex with 0-RTT, but read epoch lifetimes *are*
specified but *wrong*.

Section 4.2.1 says:

> Because DTLS records could be reordered, a record from epoch M may be
received after epoch N (where N > M) has begun. Implementations SHOULD
discard records from earlier epochs but MAY choose to retain keying
material from previous epochs for up to the default MSL specified for TCP
[RFC0793] to allow for packet reordering. (Note that the intention here is
that implementers use the current guidance from the IETF for MSL, as
specified in [RFC0793] or successors, not that they attempt to interrogate
the MSL that the system TCP stack is using.)

https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1

First, it's a bit weird to say you SHOULD discard *records* but MAY
retain *keying
material*. I assume that meant SHOULD discard records but MAY process
records anyway up to MSL. Anyway, this model implies that only one read
epoch is active at once, but this isn't true. You basically have to read
epoch 1 (early data) as unordered relative to epoches 0 and 2. Consider a
DTLS 1.3 server:

1. The server reads ClientHello with early_data extension at epoch 0 and
accepts early data.
2. The server sends ServerHello (epoch 0), EE..Finished (epoch 2), and
activates write epoch 3 for half-RTT application data.
3. The server reads early data (epoch 1) from the client. The RFC would
lead you to think the server can close read epoch 0 now, but...
4. ServerHello gets lost and, if we are to believe
https://www.rfc-editor.org/rfc/rfc9147.html#section-7.1-8, the client might
send an empty plaintext ACK to trigger a retransmit. This ACK will be at
epoch 0. This only works if the server keeps read epoch 0 open!
5. Client eventually gets the ServerHello but now it only gets half of the
epoch 2 data. It sends an ACK to trigger another retransmit. This ACK will
come at epoch 2.
6. Server receives that ACK at epoch 2 and retransmits. The RFC would lead
you to think the server can close read epoch 1 now, but...
7. Let's say that retransmit is lost again, or hasn't arrived yet. From the
client's perspective, it has a connection that has yet to reach the 1-RTT
point, so any data from the calling application will still be sent as early
data. That means the client will continue to send early data at epoch 1.
This only works if the server keeps read epoch 1 open!
8. The handshake progresses and the server finally gets 1-RTT data at epoch
3 from the client. *Now* the spirit of the rule in the text applies to
epoch 1 and the server can close the epoch (after optionally waiting a
spell for reordering)

So the rule is actually that we close according to a partially ordered set:
- 0 (unencrypted) < 2 (handshake) < 3 (first app data) < 4 < 5 < ...
- 1 (early data) < 3 (first app data) < 4 < 5 < ...
- 1 is not ordered relative to 0 and 2.


On Wed, Sep 18, 2024 at 3:47 PM David Benjamin  wrote:

> One more wriggle if we wish to allow unencrypted ACKs, though it is
> fixable. Section 7, says:
>
> > During the handshake, ACK records MUST be sent with an epoch which is
> equal to or higher than the record which is being acknowledged. [...]
> Implementations SHOULD simply use the highest current sending epoch, which
> will generally be the highest available. After the handshake,
> implementations MUST use the highest available sending epoch.
>
> Taken at face value, that text implies that a client sending 0-RTT data
> should send its ACKs at the highest current sending epoch, epoch 1 (0-RTT).
> But if the server has rejected 0-RTT data, it will not (and cannot)
> instantiate epoch 1 at all, so it won't get the ACKs! That guidance needs a
> special case: if you would have ACKed at epoch 1, you should ACK at epoch 0
> instead.
>
> Alternatively, one might interpret that situation as 0 being the sending
> epoch and 1 being some magical epoch on the side. This isn't supported by
> the document, but honestly no interpretation is supported by the document
> because the document never tells you what a "current sending epoch" even
> is. While 4.2.1 gives some rough guidance on when to close out receiving
> epochs, I could not find any text on send epoch management at all.
> Reasoning through the protocol, you might arrive at this *almost* correct
> rule:
>
> A write epoch may be discarded IF:
> 1. It is not the highest available epoch. AND
> 2. There are no unacked, outgoing messages at that epoch
>
> That rule, however, does not work in 0-RTT. If the highest epoch is 1, you
> cannot discard 0. The server might reject 0-RTT and then send
> HelloRetryRequest, at which point you will need to discard epoch 1 and
> reactivate epoch 0, maintaining continuity of sequence numbers. The
> 0-RTT/1-RTT transition is also interesting on the write side, though I'll
> start a separate thread for that.
>
> All this is subtle enough that it should not be left as an exercise to the
> r

[TLS] Re: DTLS 1.3 ACKs near the version transition

2024-09-19 Thread David Benjamin
On Thu, Sep 19, 2024 at 1:31 PM David Benjamin  wrote:

> Ah fun, another issue in this document. So not only are write epoch
> lifetimes unspecified and complex with 0-RTT, but read epoch lifetimes
> *are* specified but *wrong*.
>
> Section 4.2.1 says:
>
> > Because DTLS records could be reordered, a record from epoch M may be
> received after epoch N (where N > M) has begun. Implementations SHOULD
> discard records from earlier epochs but MAY choose to retain keying
> material from previous epochs for up to the default MSL specified for TCP
> [RFC0793] to allow for packet reordering. (Note that the intention here is
> that implementers use the current guidance from the IETF for MSL, as
> specified in [RFC0793] or successors, not that they attempt to interrogate
> the MSL that the system TCP stack is using.)
>
> https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1
>
> First, it's a bit weird to say you SHOULD discard *records* but MAY
> retain *keying material*. I assume that meant SHOULD discard records but
> MAY process records anyway up to MSL. Anyway, this model implies that only
> one read epoch is active at once, but this isn't true. You basically have
> to read epoch 1 (early data) as unordered relative to epoches 0 and 2.
> Consider a DTLS 1.3 server:
>
> 1. The server reads ClientHello with early_data extension at epoch 0 and
> accepts early data.
> 2. The server sends ServerHello (epoch 0), EE..Finished (epoch 2), and
> activates write epoch 3 for half-RTT application data.
> 3. The server reads early data (epoch 1) from the client. The RFC would
> lead you to think the server can close read epoch 0 now, but...
> 4. ServerHello gets lost and, if we are to believe
> https://www.rfc-editor.org/rfc/rfc9147.html#section-7.1-8, the client
> might send an empty plaintext ACK to trigger a retransmit. This ACK will be
> at epoch 0. This only works if the server keeps read epoch 0 open!
> 5. Client eventually gets the ServerHello but now it only gets half of the
> epoch 2 data. It sends an ACK to trigger another retransmit. This ACK will
> come at epoch 2.
> 6. Server receives that ACK at epoch 2 and retransmits. The RFC would lead
> you to think the server can close read epoch 1 now, but...
> 7. Let's say that retransmit is lost again, or hasn't arrived yet. From
> the client's perspective, it has a connection that has yet to reach the
> 1-RTT point, so any data from the calling application will still be sent as
> early data. That means the client will continue to send early data at epoch
> 1. This only works if the server keeps read epoch 1 open!
> 8. The handshake progresses and the server finally gets 1-RTT data at
> epoch 3 from the client. *Now* the spirit of the rule in the text applies
> to epoch 1 and the server can close the epoch (after optionally waiting a
> spell for reordering)
>

Ah right, Nick Harper points out that servers really should close read
epoch 1 [up to a delay to accommodate reordering] as soon as they receive
the Finished message (epoch 2) and complete the handshake, not wait for an
epoch 3 record. (But it must specifically be on handshake completion, not
*any* epoch 2 record. Record-layer only logic cannot assume 1 < 2 because 2
might contain pre-Finished ACKs.)

All this is missing from the specification. :-) I think we need to rewrite
the spec text on epochs to more explicitly discuss their lifetimes.


> So the rule is actually that we close according to a partially ordered set:
> - 0 (unencrypted) < 2 (handshake) < 3 (first app data) < 4 < 5 < ...
> - 1 (early data) < 3 (first app data) < 4 < 5 < ...
> - 1 is not ordered relative to 0 and 2.
>
>
> On Wed, Sep 18, 2024 at 3:47 PM David Benjamin 
> wrote:
>
>> One more wriggle if we wish to allow unencrypted ACKs, though it is
>> fixable. Section 7, says:
>>
>> > During the handshake, ACK records MUST be sent with an epoch which is
>> equal to or higher than the record which is being acknowledged. [...]
>> Implementations SHOULD simply use the highest current sending epoch, which
>> will generally be the highest available. After the handshake,
>> implementations MUST use the highest available sending epoch.
>>
>> Taken at face value, that text implies that a client sending 0-RTT data
>> should send its ACKs at the highest current sending epoch, epoch 1 (0-RTT).
>> But if the server has rejected 0-RTT data, it will not (and cannot)
>> instantiate epoch 1 at all, so it won't get the ACKs! That guidance needs a
>> special case: if you would have ACKed at epoch 1, you should ACK at epoch 0
>> instead.
>>
>> Alternatively, one might interpret that situation as 0 being the sending
>> epoch and 1 being some magical epoch on the side. This isn't supported by
>> the document, but honestly no interpretation is supported by the document
>> because the document never tells you what a "current sending epoch" even
>> is. While 4.2.1 gives some rough guidance on when to close out receiving
>> epochs, I could not find any text on send