Hi David,

this is great feedback. Give me a few days to respond to this issue with my 
suggestion for moving forward.

Ciao
Hannes

From: TLS <tls-boun...@ietf.org> On Behalf Of David Benjamin
Sent: Saturday, April 13, 2024 7:59 PM
To: <tls@ietf.org> <tls@ietf.org>
Cc: Nick Harper <nhar...@chromium.org>
Subject: Re: [TLS] Issues with buffered, ACKed KeyUpdates in DTLS 1.3

Another issues with DTLS 1.3's state machine duplication scheme:

Section 8 says implementation must not send new KeyUpdate until the KeyUpdate 
is ACKed, but it says nothing about other post-handshake messages. Suppose 
KeyUpdate(5) in flight and the implementation decides to send NewSessionTicket. 
(E.g. the application called some "send NewSessionTicket" API.) The new epoch 
doesn't exist yet, so naively one would start sending NewSessionTicket(6) in 
the current epoch. Now the peer ACKs KeyUpdate(5), so we transition to the new 
epoch. But retransmissions must retain their original epoch:

> Implementations MUST send retransmissions of lost messages using the same 
> epoch and keying material as the original transmission.
https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1-3

This means we must keep sending the NST at the old epoch. But the peer may have 
no idea there's a message at that epoch due to packet loss! Section 8 does ask 
the peer to keep the old epoch around for a spell, but eventually the peer will 
discard the old epoch. If NST(6) didn't get through before then, the entire 
post-handshake stream is now wedged!

I think this means we need to amend Section 8 to forbid sending *any* 
post-handshake message after KeyUpdate. That is, rather than saying you cannot 
send a new KeyUpdate, a KeyUpdate terminates the post-handshake stream at that 
epoch and all new post-handshake messages, be they KeyUpdate or anything else, 
must be enqueued for the new epoch. This is a little unfortunate because a TLS 
library which transparently KeyUpdates will then inadvertently introduce 
hiccups where post-handshake messages triggered by the application, like 
post-handshake auth, are blocked.

That then suggests some more options for fixing the original problem.

7. Fix the sender's KeyUpdate criteria

We tell the sender to wait for all previous messages to be ACKed too. Fix the 
first paragraph of section 8 to say:

> As with other handshake messages with no built-in response, KeyUpdates MUST 
> be acknowledged. Acknowledgements are used to both control retransmission and 
> transition to the next epoch. Implementations MUST NOT send records with the 
> new keys until the KeyUpdate and all preceding messages have been 
> acknowledged. This facilitates epoch reconstruction (Section 4.2.2) and 
> avoids too many epochs in active use, by ensuring the peer has processed the 
> KeyUpdate and started receiving at the new epoch.
>
> A KeyUpdate message terminates the post-handshake stream in an epoch. After 
> sending KeyUpdate in an epoch, implementations MUST NOT send any new 
> post-handshake messages in that epoch. Note that, if the implementation has 
> sent KeyUpdate but is waiting for an ACK, the next epoch is not yet active. 
> In this case, subsequent post-handshake messages may not be sent until 
> receiving the ACK.

And then on the receiver side, we leave things as-is. If the sender implemented 
the old semantics AND had multiple post-handshake transactions in parallel, it 
might update keys too early and then we get into the situation described in 
(1). We then declare that, if this happens, and the sender gets confused as a 
result, that's the sender's fault. Hopefully this is not rare enough (did 
anyone even implement 5.8.4, or does everyone just serialize their 
post-handshake transitions?) to not be a serious protocol break? That risk 
aside, this option seems the most in spirit with the current design to me.

8. Decouple post-handshake retransmissions from epochs

If we instead say that the same epoch rule only applies for the handshake, and 
not post-handshake messages, I think option 5 (process KeyUpdate out of order) 
might become viable? I'm not sure. Either way, this seems like a significant 
protocol break, so I don't think this is an option until some hypothetical DTLS 
1.4.


On Fri, Apr 12, 2024 at 6:59 PM David Benjamin 
<david...@chromium.org<mailto:david...@chromium.org>> wrote:
Hi all,

This is going to be a bit long. In short, DTLS 1.3 KeyUpdates seem to conflate 
the peer receiving the KeyUpdate with the peer processing the KeyUpdate, in 
ways that appear to break some assumptions made by the protocol design.

When to switch keys in KeyUpdate

So, first, DTLS 1.3, unlike TLS 1.3, applies the KeyUpdate on the ACK, not when 
the KeyUpdate is sent. This makes sense because KeyUpdate records are not 
intrinsically ordered with app data records sent after them:

> As with other handshake messages with no built-in response, KeyUpdates MUST 
> be acknowledged. In order to facilitate epoch reconstruction (Section 4.2.2), 
> implementations MUST NOT send records with the new keys or send a new 
> KeyUpdate until the previous KeyUpdate has been acknowledged (this avoids 
> having too many epochs in active use).
https://www.rfc-editor.org/rfc/rfc9147.html#section-8-1

Now, the parenthetical says this is to avoid having too many epochs in active 
use, but it appears that there are stronger assumptions on this:

> After the handshake is complete, if the epoch bits do not match those from 
> the current epoch, implementations SHOULD use the most recent *past* epoch 
> which has matching bits, and then reconstruct the sequence number for that 
> epoch as described above.
https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.2-3
(emphasis mine)

> After the handshake, implementations MUST use the highest available sending 
> epoch [to send ACKs]
https://www.rfc-editor.org/rfc/rfc9147.html#section-7-7

These two snippets imply the protocol wants the peer to definitely have 
installed the new keys before you start using them. This makes sense because 
sending stuff the peer can't decrypt is pretty silly. As an aside, DTLS 1.3 
retains this text from DTLS 1.2:

> Conversely, it is possible for records that are protected with the new epoch 
> to be received prior to the completion of a handshake. For instance, the 
> server may send its Finished message and then start transmitting data. 
> Implementations MAY either buffer or discard such records, though when DTLS 
> is used over reliable transports (e.g., SCTP [RFC4960]), they SHOULD be 
> buffered and processed once the handshake completes.
https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1-2

The text from DTLS 1.2 talks about *a* handshake, which presumably refers to 
rekeying via renegotiation. But in DTLS 1.3, the epoch reconstruction rule and 
the KeyUpdate rule mean this is only possible during the handshake, when you 
see epoch 4 and expect epoch 0-3. The steady state rekeying mechanism never 
hits this case. (This is a reasonable change because there's no sense in 
unnecessarily introducing blips where the connection is less tolerant of 
reordering.)

Buffered handshake messages

Okay, so KeyUpdates want to wait for the recipient to install keys, except we 
don't seem to actually achieve this! Section 5.2 says:

> DTLS implementations maintain (at least notionally) a next_receive_seq 
> counter. This counter is initially set to zero. When a handshake message is 
> received, if its message_seq value matches next_receive_seq, next_receive_seq 
> is incremented and the message is processed. If the sequence number is less 
> than next_receive_seq, the message MUST be discarded. If the sequence number 
> is greater than next_receive_seq, the implementation SHOULD queue the message 
> but MAY discard it. (This is a simple space/bandwidth trade-off).
https://www.rfc-editor.org/rfc/rfc9147.html#section-5.2-7

I assume this is intended to apply to post-handshake messages too. (See below 
for a discussion of the alternative.) But that means that, when you receive a 
KeyUpdate, you might not immediately process it. Suppose next_receive_seq is 5, 
and the peer sends NewSessionTicket(5), NewSessionTicket(6), and KeyUpdate(7). 
5 is lost, but 6 and 7 come in, perhaps even in the same record which means 
that you're forced to ACK both or neither. But suppose the implementation is 
willing to buffer 3 messages ahead, so it ACKs the 6+7 record, by the rules in 
section 7, which permits ACKing fragments that were buffered and not yet 
processed.

That means the peer will switch keys and now all subsequent records from them 
will come from epoch N+1. But the sender is not ready for N+1 yet, so we 
contradict everything above. We also contradict this parenthetical in section 8:

> Due to loss and/or reordering, DTLS 1.3 implementations may receive a record 
> with an older epoch than the current one (the requirements above preclude 
> receiving a newer record).
https://www.rfc-editor.org/rfc/rfc9147.html#section-8-2

I assume then that this was not actually what was intended.

Options (and non-options)

Assuming I'm reading this right, we seem to have made a mess of things. The 
sender could avoid this by only allowing one active post-handshake transaction 
at a time and serializing them, at the cost of taking a round-trip for each. 
But the receiver needs to account for all possible senders, so that doesn't 
help. Some options that come to mind:

1. Accept that the sender updates its keys too early

Apart from contradicting most of the specification text, the protocol doesn't 
break per se if you just allow the peer to switch keys early in this buffered 
KeyUpdate case. We merely contradict all of the explanatory text and introduce 
a bunch of cases that the specification suggests are impossible. :-) Also the 
connection quality is poor.

The sender will use epoch N+1 at a point when the peer is on N. But epoch 
reconstruction will misread it as N-3 instead of N+1, and either way you won't 
have the keys to decrypt it yet! The connection is interrupted (and with all 
packets discarded because epoch reconstruction fails!) until the peer 
retransmits 5 and you catch up. Until then, not only will you not receive 
application data, but you also won't receive ACKs. This also adds a subtle 
corner case on the sender side: the sender cannot discard the old sending keys 
because it still has unACKed messages from the previous epoch to retransmit, 
but this is not called out in section 8. Section 8 only discusses the receiver 
needing to retain the old epoch.

This seems not great. Also it contradicts much of the text in the spec, 
including section 8 explicitly saying this case cannot happen.

2. Never ACK buffered KeyUpdates

We can say that KeyUpdates are special and, unless you're willing to process 
them immediately, you must not ACK the records containing them. This means you 
might under-ACK and the peer might over-retransmit, but seems not fatal. This 
also seems a little hairy to implement if you want to avoid under-ACKing 
unnecessarily. You might have message NewSessionTicket(6) buffered and then 
receive a record with NewSessionTicket(5) and KeyUpdate(7). That record may 
appear unACKable, but it's fine because you'll immediately process 5 then 6 
then 7... unless your NewSessionTicket process is asynchronous, in which case 
it might not be?

Despite all that mess, this seems the most viable option?

3. Declare this situation a sender error

We could say this is not allowed and senders MUST NOT send KeyUpdate if there 
are any outstanding post-handshake messages. And then the receiver should fail 
with unexpected_message if it ever receives KeyUpdate at a future message_seq. 
But as the RFC is already published, I don't know if this is compatible with 
existing implementations.

4. Explicit KeyUpdateAck message

We could have made a KeyUpdateAck message to signal that you've processed a 
KeyUpdate, not just sent it. But that's a protocol change and the RFC is 
stamped, so it's too late now.

5. Process KeyUpdate out of order

We could say that the receiver doesn't buffer KeyUpdate. It just goes ahead and 
processes it immediately to install epoch N+1. This seems like it would address 
the issue but opens more cans of worms. Now the receiver needs to keep the old 
epoch around for more than packet reorder, but also to pick up the 
retransmissions of the missing handshake messages. Also, by activating the new 
epoch, the receiver now allows the sender to KeyUpdate again, and again, and 
again. But, several epochs later, the holes in the message stream may remain 
unfilled, so we still need the old keys. Without further protocol rules, a 
sender could force the receiver to keep keys arbitrarily many records back. All 
this is, at best, a difficult case that is unlikely to be well-tested, and at 
worst get the implementation into some broken state and then misbehave badly.

6. Post-handshake transactions aren't ordered at all

It could be that my assumption above was wrong and the next_receive_seq 
discussion in 5.2 only applies to the handshake. After all, section 5.8.4 
discusses how every post-handshake transaction duplicates the "state machine". 
Except it only says to duplicate the 5.8.1 state machine, and it's unclear 
ambiguous whether that includes the message_seq logic.

However, going this direction seems to very quickly make a mess. If each 
post-handshake transaction handles message_seq independently, you cannot 
distinguish a retransmission from a new transaction. That seems quite bad, so 
presumably the intent was to use message_seq to distinguish those. (I.e. the 
intent can't have been to duplicate the message_seq state.) Indeed, we have:

> However, in DTLS 1.3 the message_seq is not reset, to allow distinguishing a 
> retransmission from a previously sent post-handshake message from a newly 
> sent post-handshake message.
https://www.rfc-editor.org/rfc/rfc9147.html#section-5.2-6

But if we distinguish with message_seq AND process transactions out of order, 
now receivers need to keep track of fairly complex state in case they process 
messages 5, 7, 9, 11, 13, 15, 17, ... but then only get the even ones later. 
And we'd need to define some kind of sliding window for what happens if you 
receive message_seq 9000 all of a sudden. And we import all the cross-epoch 
problems in option 5 above. None of that is in the text, so I assume this was 
not the intended reading, and I don't think we want to go that direction. :-)

Digression: ACK fate-sharing and flow control

All this alludes to another quirk that isn't a problem, but is a little 
non-obvious and warrants some discussion in the spec. Multiple handshake 
fragments may be packed into the same record, but ACKs apply to the whole 
record. If you receive a fragment for a message sequence too far into the 
future, you are permitted to discard the fragment. But if you discard any 
fragment, you cannot ACK the record, even if there were fragments which you did 
process. During the handshake, an implementation could avoid needing to make 
this decision by knowing the maximum size of a handshake flight. After the 
handshake, there is no inherent limit on how many NewSessionTickets the peer 
may choose to send in a row, and no flow control.

QUIC ran into a similar issue here and said an implementation can choose an 
ad-hoc limit, after which it can choose to either wedge the post-handshake 
stream or return an error.
https://github.com/quicwg/base-drafts/issues/1834
https://github.com/quicwg/base-drafts/pull/2524

I suspect the most practical outcome for DTLS (and arguably already supported 
by the existing text, but not very obviously), is to instead say the receiver 
just refuses to ACK stuff and, okay, maybe in some weird edge cases the 
receiver under-ACKs and then the sender over-retransmits, until things settle 
down. Whereas ACKs are a bit more tightly integrated with QUIC, so refusing to 
ACK a packet due to one bad frame is less of an option. Still, I think this 
would have been worth calling out in the text.


So... did I read all this right? Did we indeed make a mess of this, or did I 
miss something?

David



_______________________________________________
TLS mailing list
TLS@ietf.org
https://www.ietf.org/mailman/listinfo/tls

Reply via email to