[TLS] DTLS 1.3 ACKs near the version transition

David Benjamin Thu, 12 Sep 2024 13:32:44 -0700

Hi all,

I noticed another issue with the DTLS 1.3 ACK design. :-)


So, DTLS 1.3 uses ACKs. DTLS 1.2 does not use ACKs. But you only learn what
version you're speaking partway through the lifetime of the connection, so
there are some interesting corner cases to answer. As an illustrative
example, I believe the diagram in section 6 is [probably] incorrect:
https://www.rfc-editor.org/rfc/rfc9147.html#section-6

If the client loses the first packet, it never sees the ServerHello and
thus learns it's speaking DTLS 1.3. While it does see the second packet,
that packet only contains ciphertext that it cannot decrypt. Unless it
decides to say "this looks like a 1.3 record header, therefore I will turn
on the 1.3 state machine", which isn't supported by the RFC (maybe TLS 1.4
will use the same record header but redo ACKs once again), it shouldn't
activate the 1.3 state machine yet. I expect what will *actually* happen is
that the client will wait for the retransmission timeout a la DTLS 1.2.

More generally, I believe these are the situations to worry about:

1. If a DTLS 1.2 (i.e. does not implement RFC 9147 at all) implementation
receives an ACK record for whatever reason, what happens? This decision we
don't get to change. Rather, it is a design constraint. Both OpenSSL and
BoringSSL treat unexpected record types as a fatal error. I haven't checked
other implementations. So I think we must take as a constraint that you
cannot send an ACK unless you know the peer is 1.3-capable.

2. Do plaintext ACKs exist? Or is the plaintext epoch permanently at the
old state machine? Honestly, I wish the answer here was "no". That would
have avoided so many problems, because then epochs never change state
machines. Unfortunately, the RFC does not support this interpretation.
Section 4.1 talks about how to demux a plaintext ACK, and section 6, though
wrong, clearly depicts a plaintext ACK. So instead we get to worry about
the transition within an epoch. Keep in mind that transitions happen at
different times on both sides. Keep in mind that there is a portion of the
plaintext epoch that lasts after version negotiation in HelloRetryRequest
handshakes.

3. If a 1.3-capable server receives half of a ClientHello, does it send an
ACK? I believe (1) means the answer must be "no". If you haven't read the
ClientHello, you haven't selected the version, so you don't know if the
client is 1.3-capable or not. If the client is not 1.3-capable, sending an
ACK may be incompatible.

4. Is it possible for a 1.3-capable client to receive an ACK *before* it
receives a ServerHello? If so, how does the client respond? I believe the
answer to this question, if plaintext ACKs exist, is unavoidably "yes".
Suppose the server receives a 1.3 ClientHello and then negotiates DTLS 1.3.
That is a complete flight, so Section 7.1 discourages ACKing explicitly
(you can ACK implicitly), but it *does not forbid* an explicit ACK. An
explicit ACK may be sent if the server cannot generate its responding
flight immediately. That means a server could well send ACK followed by
ServerHello. Now suppose ServerHello is lost but the ACK gets through. Now
the client must decide what it's doing. Rejecting the ACK would result in
connection failure, so we must either drop the ACK on the floor, or process
it. While processing it would be more efficient (you don't need to
retransmit the whole ClientHello), it means the plaintext epoch must
support this hybrid state where 1.3 ACKs are processed but never sent! Or
perhaps receiving that ACK transitions you to the 1.3 state machine even
though you don't know the version yet. That all sounds like a mess, so I
would advocate you simply drop it on the floor.

5. If a 1.3-capable client receives half of the server's first message (HRR
or ServerHello), does it send an ACK? Again, because of (1), I believe the
answer must be "no". If you don't know the server's selected version, the
server may not be 1.3-capable and may not be compatible with the ACK.

6. What does a 1.3-capable server do if it receives an ACK prior to picking
the TLS version? Unlike (4), I believe this is impossible. If the client
has something to ACK, the server must have sent something, which the server
will only do once it's received the full ClientHello and thus picked the
version. However, given (4), I suspect an implementation will naturally
just drop that ACK. In this state error vs drop is kinda academic.

>From what I can tell, RFC 9147 is silent on all of this. I think it should
say something. I believe these are the plausible options:

OPTION A -- There are no ACKs in epoch 0.

We avoid this ridiculous transition point and say that ACKs only exist
starting epoch 1. Epoch 0 uses the old DTLS 1.2 state machine. This is very
attractive from a simplicity perspective, but since RFC 9147 was already
published with this ambiguity, I think we need to, at minimum, say that
DTLS 1.3 implementations drop epoch 0 ACKs on the floor. It also means that
packet loss in HelloRetryRequest flows may be less efficient. That said, if
your HelloRetryRequest is stateless (not applicable to all DTLS uses),
you're probably not doing anything with ACKs anyway. Saying those ACKs
avoids having to think about that case, at the cost of a worse transport
for stateful HelloRetryRequest.

OPTION B -- Epoch 0 enables ACKing once the version is learned.

Once you know the version, you start sending and processing ACKs. Before
you know the version, you drop ACKs on the floor and never send them. This
requires convincing ourselves that the transition point works out, notably
when one side is still ACK-less and the other side is still ACK-ful, but I
believe it works out.

OPTION C -- Epoch 0 always receives and acts on ACKs, but it doesn't send
ACKs until the version is learned.

This is the same as above, but instead of dropping ACKs, you go ahead and
let that drive your state machine. But you don't send them. This makes
reasoning about the protocol even more complicated because there are even
more states you can be in w.r.t. your known version vs the state of your
transport. It does improve behavior around packet loss, but I think it only
helps this edge case in question (4) above, which is already a case where
servers aren't expected to send ACKs anyway.

I think I lean towards Option A for simplicity, even though it decidedly
contradicts a lot of text in the RFC right now. That will be hard to encode
in an erratum as a few things need to change. But I also have 7 other
eratta open against this document, so maybe it's time for rfc9147bis.

David

_______________________________________________
TLS mailing list -- tls@ietf.org
To unsubscribe send an email to tls-le...@ietf.org

[TLS] DTLS 1.3 ACKs near the version transition

Reply via email to