[TLS] Re: DTLS 1.3 ACKs near the version transition

Eric Rescorla Mon, 23 Sep 2024 17:08:16 -0700

Hi David,

Thanks for digging in here. I haven't fully processed your comments, but it
does seem like we probably do need a -bis. Now that we've gotten 8446-bis
and ECH out the door, I don't think this is implausible. Do you feel like
you are getting close to a complete list of issues to be addressed there?


-Ekr



On Mon, Sep 23, 2024 at 3:44 PM David Benjamin <david...@chromium.org>
wrote:

> For my neck of the woods, DTLS matters for WebRTC. It really should be
> QUIC, but alas it isn't and I suspect redesigning all of WebRTC now atop
> QUIC and then fully completing the transition would take much longer than
> getting to DTLS 1.3, much as the DTLS 1.3 specification needs a -bis
> document. :-)
>
> On Mon, Sep 23, 2024 at 6:10 PM Watson Ladd <watsonbl...@gmail.com> wrote:
>
>> Backing up a bit, at what point do we say QUIC Datagram is the right
>> way to do this?
>>
>> This whole adventure sounds like a mess.
>>
>> On Fri, Sep 20, 2024 at 8:20 AM David Benjamin <david...@chromium.org>
>> wrote:
>> >
>> > (Resending since I don't see these two mails in the list archives, so
>> I'm not sure if the list software broke again. Apologies if this is a
>> duplicate mail!)
>> >
>> > On Thu, Sep 19, 2024 at 1:49 PM David Benjamin <david...@google.com>
>> wrote:
>> >>
>> >> On Thu, Sep 19, 2024 at 1:31 PM David Benjamin <david...@google.com>
>> wrote:
>> >>>
>> >>> Ah fun, another issue in this document. So not only are write epoch
>> lifetimes unspecified and complex with 0-RTT, but read epoch lifetimes are
>> specified but wrong.
>> >>>
>> >>> Section 4.2.1 says:
>> >>>
>> >>> > Because DTLS records could be reordered, a record from epoch M may
>> be received after epoch N (where N > M) has begun. Implementations SHOULD
>> discard records from earlier epochs but MAY choose to retain keying
>> material from previous epochs for up to the default MSL specified for TCP
>> [RFC0793] to allow for packet reordering. (Note that the intention here is
>> that implementers use the current guidance from the IETF for MSL, as
>> specified in [RFC0793] or successors, not that they attempt to interrogate
>> the MSL that the system TCP stack is using.)
>> >>>
>> >>> https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1
>> >>>
>> >>> First, it's a bit weird to say you SHOULD discard records but MAY
>> retain keying material. I assume that meant SHOULD discard records but MAY
>> process records anyway up to MSL. Anyway, this model implies that only one
>> read epoch is active at once, but this isn't true. You basically have to
>> read epoch 1 (early data) as unordered relative to epoches 0 and 2.
>> Consider a DTLS 1.3 server:
>> >>>
>> >>> 1. The server reads ClientHello with early_data extension at epoch 0
>> and accepts early data.
>> >>> 2. The server sends ServerHello (epoch 0), EE..Finished (epoch 2),
>> and activates write epoch 3 for half-RTT application data.
>> >>> 3. The server reads early data (epoch 1) from the client. The RFC
>> would lead you to think the server can close read epoch 0 now, but...
>> >>> 4. ServerHello gets lost and, if we are to believe
>> https://www.rfc-editor.org/rfc/rfc9147.html#section-7.1-8, the client
>> might send an empty plaintext ACK to trigger a retransmit. This ACK will be
>> at epoch 0. This only works if the server keeps read epoch 0 open!
>> >>> 5. Client eventually gets the ServerHello but now it only gets half
>> of the epoch 2 data. It sends an ACK to trigger another retransmit. This
>> ACK will come at epoch 2.
>> >>> 6. Server receives that ACK at epoch 2 and retransmits. The RFC would
>> lead you to think the server can close read epoch 1 now, but...
>> >>> 7. Let's say that retransmit is lost again, or hasn't arrived yet.
>> From the client's perspective, it has a connection that has yet to reach
>> the 1-RTT point, so any data from the calling application will still be
>> sent as early data. That means the client will continue to send early data
>> at epoch 1. This only works if the server keeps read epoch 1 open!
>> >>> 8. The handshake progresses and the server finally gets 1-RTT data at
>> epoch 3 from the client. Now the spirit of the rule in the text applies to
>> epoch 1 and the server can close the epoch (after optionally waiting a
>> spell for reordering)
>> >>
>> >>
>> >> Ah right, Nick Harper points out that servers really should close read
>> epoch 1 [up to a delay to accommodate reordering] as soon as they receive
>> the Finished message (epoch 2) and complete the handshake, not wait for an
>> epoch 3 record. (But it must specifically be on handshake completion, not
>> any epoch 2 record. Record-layer only logic cannot assume 1 < 2 because 2
>> might contain pre-Finished ACKs.)
>> >>
>> >> All this is missing from the specification. :-) I think we need to
>> rewrite the spec text on epochs to more explicitly discuss their lifetimes.
>> >>
>> >>>
>> >>> So the rule is actually that we close according to a partially
>> ordered set:
>> >>> - 0 (unencrypted) < 2 (handshake) < 3 (first app data) < 4 < 5 < ...
>> >>> - 1 (early data) < 3 (first app data) < 4 < 5 < ...
>> >>> - 1 is not ordered relative to 0 and 2.
>> >>>
>> >>>
>> >>> On Wed, Sep 18, 2024 at 3:47 PM David Benjamin <david...@google.com>
>> wrote:
>> >>>>
>> >>>> One more wriggle if we wish to allow unencrypted ACKs, though it is
>> fixable. Section 7, says:
>> >>>>
>> >>>> > During the handshake, ACK records MUST be sent with an epoch which
>> is equal to or higher than the record which is being acknowledged. [...]
>> Implementations SHOULD simply use the highest current sending epoch, which
>> will generally be the highest available. After the handshake,
>> implementations MUST use the highest available sending epoch.
>> >>>>
>> >>>> Taken at face value, that text implies that a client sending 0-RTT
>> data should send its ACKs at the highest current sending epoch, epoch 1
>> (0-RTT). But if the server has rejected 0-RTT data, it will not (and
>> cannot) instantiate epoch 1 at all, so it won't get the ACKs! That guidance
>> needs a special case: if you would have ACKed at epoch 1, you should ACK at
>> epoch 0 instead.
>> >>>>
>> >>>> Alternatively, one might interpret that situation as 0 being the
>> sending epoch and 1 being some magical epoch on the side. This isn't
>> supported by the document, but honestly no interpretation is supported by
>> the document because the document never tells you what a "current sending
>> epoch" even is. While 4.2.1 gives some rough guidance on when to close out
>> receiving epochs, I could not find any text on send epoch management at
>> all. Reasoning through the protocol, you might arrive at this almost
>> correct rule:
>> >>>>
>> >>>> A write epoch may be discarded IF:
>> >>>> 1. It is not the highest available epoch. AND
>> >>>> 2. There are no unacked, outgoing messages at that epoch
>> >>>>
>> >>>> That rule, however, does not work in 0-RTT. If the highest epoch is
>> 1, you cannot discard 0. The server might reject 0-RTT and then send
>> HelloRetryRequest, at which point you will need to discard epoch 1 and
>> reactivate epoch 0, maintaining continuity of sequence numbers. The
>> 0-RTT/1-RTT transition is also interesting on the write side, though I'll
>> start a separate thread for that.
>> >>>>
>> >>>> All this is subtle enough that it should not be left as an exercise
>> to the reader.
>> >>>>
>> >>>> David
>> >>>>
>> >>>> On Wed, Sep 18, 2024 at 12:39 AM Bob Beck <b...@obtuse.com> wrote:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> > On Sep 17, 2024, at 5:28 PM, David Benjamin <davidben=
>> 40google....@dmarc.ietf.org> wrote:
>> >>>>> >
>> >>>>> > Ah, I just noticed this text at the end of Section 7.1:
>> >>>>> >
>> >>>>> > > Note that in some cases it may be necessary to send an ACK
>> which does not contain any record numbers. For instance, a client might
>> receive an EncryptedExtensions message prior to receiving a ServerHello.
>> Because it cannot decrypt the EncryptedExtensions, it cannot safely
>> acknowledge it (as it might be damaged). If the client does not send an
>> ACK, the server will eventually retransmit its first flight, but this might
>> take far longer than the actual round trip time between client and server.
>> Having the client send an empty ACK shortcuts this process.
>> >>>>> >
>> >>>>> > https://www.rfc-editor.org/rfc/rfc9147.html#section-7.1-8
>> >>>>> >
>> >>>>> > I guess then the intent is indeed that if you receive some random
>> encrypted DTLS 1.3 header, even though you don't know it's DTLS 1.3 yet,
>> you interpret as activating the ACKing mechanism? But that seems to prompt
>> more questions than it answers. For instance, what happens if you do that,
>> but then finally receive the ServerHello and it turns out this was just
>> some junk packet and we're really negotiation DTLS 1.2? Do you check that
>> the ACK mechanism has been activated and return an error? Do you just pause
>> the ACK mechanism and hope you're in an OK state? This seems quite prune to
>> send the implementation into unexpected and untested states.
>> >>>>> >
>> >>>>> >
>> >>>>>
>> >>>>>
>> >>>>> Yeah, I think this has missed a nasty corner case here for
>> implementations that support both.
>> >>>>>
>> >>>>> I think I also lean towards option A) (from below) here. Anyone
>> else who has gotten at least their hands mildly dirty in a DTLS
>> implementation that supports both 1.2 and 1.3 care to chime in as well?
>> >>>>>
>> >>>>>
>> >>>>> > On Thu, Sep 12, 2024 at 4:31 PM David Benjamin <
>> david...@google.com> wrote:
>> >>>>> > Hi all,
>> >>>>> >
>> >>>>> > I noticed another issue with the DTLS 1.3 ACK design. :-)
>> >>>>> >
>> >>>>> > So, DTLS 1.3 uses ACKs. DTLS 1.2 does not use ACKs. But you only
>> learn what version you're speaking partway through the lifetime of the
>> connection, so there are some interesting corner cases to answer. As an
>> illustrative example, I believe the diagram in section 6 is [probably]
>> incorrect:
>> >>>>> > https://www.rfc-editor.org/rfc/rfc9147.html#section-6
>> >>>>> >
>> >>>>> > If the client loses the first packet, it never sees the
>> ServerHello and thus learns it's speaking DTLS 1.3. While it does see the
>> second packet, that packet only contains ciphertext that it cannot decrypt.
>> Unless it decides to say "this looks like a 1.3 record header, therefore I
>> will turn on the 1.3 state machine", which isn't supported by the RFC
>> (maybe TLS 1.4 will use the same record header but redo ACKs once again),
>> it shouldn't activate the 1.3 state machine yet. I expect what will
>> actually happen is that the client will wait for the retransmission timeout
>> a la DTLS 1.2.
>> >>>>> >
>> >>>>> > More generally, I believe these are the situations to worry about:
>> >>>>> >
>> >>>>> > 1. If a DTLS 1.2 (i.e. does not implement RFC 9147 at all)
>> implementation receives an ACK record for whatever reason, what happens?
>> This decision we don't get to change. Rather, it is a design constraint.
>> Both OpenSSL and BoringSSL treat unexpected record types as a fatal error.
>> I haven't checked other implementations. So I think we must take as a
>> constraint that you cannot send an ACK unless you know the peer is
>> 1.3-capable.
>> >>>>> >
>> >>>>> > 2. Do plaintext ACKs exist? Or is the plaintext epoch permanently
>> at the old state machine? Honestly, I wish the answer here was "no". That
>> would have avoided so many problems, because then epochs never change state
>> machines. Unfortunately, the RFC does not support this interpretation.
>> Section 4.1 talks about how to demux a plaintext ACK, and section 6, though
>> wrong, clearly depicts a plaintext ACK. So instead we get to worry about
>> the transition within an epoch. Keep in mind that transitions happen at
>> different times on both sides. Keep in mind that there is a portion of the
>> plaintext epoch that lasts after version negotiation in HelloRetryRequest
>> handshakes.
>> >>>>> >
>> >>>>> > 3. If a 1.3-capable server receives half of a ClientHello, does
>> it send an ACK? I believe (1) means the answer must be "no". If you haven't
>> read the ClientHello, you haven't selected the version, so you don't know
>> if the client is 1.3-capable or not. If the client is not 1.3-capable,
>> sending an ACK may be incompatible.
>> >>>>> >
>> >>>>> > 4. Is it possible for a 1.3-capable client to receive an ACK
>> before it receives a ServerHello? If so, how does the client respond? I
>> believe the answer to this question, if plaintext ACKs exist, is
>> unavoidably "yes". Suppose the server receives a 1.3 ClientHello and then
>> negotiates DTLS 1.3. That is a complete flight, so Section 7.1 discourages
>> ACKing explicitly (you can ACK implicitly), but it does not forbid an
>> explicit ACK. An explicit ACK may be sent if the server cannot generate its
>> responding flight immediately. That means a server could well send ACK
>> followed by ServerHello. Now suppose ServerHello is lost but the ACK gets
>> through. Now the client must decide what it's doing. Rejecting the ACK
>> would result in connection failure, so we must either drop the ACK on the
>> floor, or process it. While processing it would be more efficient (you
>> don't need to retransmit the whole ClientHello), it means the plaintext
>> epoch must support this hybrid state where 1.3 ACKs are processed but never
>> sent! Or perhaps receiving that ACK transitions you to the 1.3 state
>> machine even though you don't know the version yet. That all sounds like a
>> mess, so I would advocate you simply drop it on the floor.
>> >>>>> >
>> >>>>> > 5. If a 1.3-capable client receives half of the server's first
>> message (HRR or ServerHello), does it send an ACK? Again, because of (1), I
>> believe the answer must be "no". If you don't know the server's selected
>> version, the server may not be 1.3-capable and may not be compatible with
>> the ACK.
>> >>>>> >
>> >>>>> > 6. What does a 1.3-capable server do if it receives an ACK prior
>> to picking the TLS version? Unlike (4), I believe this is impossible. If
>> the client has something to ACK, the server must have sent something, which
>> the server will only do once it's received the full ClientHello and thus
>> picked the version. However, given (4), I suspect an implementation will
>> naturally just drop that ACK. In this state error vs drop is kinda academic.
>> >>>>> >
>> >>>>> > From what I can tell, RFC 9147 is silent on all of this. I think
>> it should say something. I believe these are the plausible options:
>> >>>>> >
>> >>>>> > OPTION A -- There are no ACKs in epoch 0.
>> >>>>> >
>> >>>>> > We avoid this ridiculous transition point and say that ACKs only
>> exist starting epoch 1. Epoch 0 uses the old DTLS 1.2 state machine. This
>> is very attractive from a simplicity perspective, but since RFC 9147 was
>> already published with this ambiguity, I think we need to, at minimum, say
>> that DTLS 1.3 implementations drop epoch 0 ACKs on the floor. It also means
>> that packet loss in HelloRetryRequest flows may be less efficient. That
>> said, if your HelloRetryRequest is stateless (not applicable to all DTLS
>> uses), you're probably not doing anything with ACKs anyway. Saying those
>> ACKs avoids having to think about that case, at the cost of a worse
>> transport for stateful HelloRetryRequest.
>> >>>>> >
>> >>>>> > OPTION B -- Epoch 0 enables ACKing once the version is learned.
>> >>>>> >
>> >>>>> > Once you know the version, you start sending and processing ACKs.
>> Before you know the version, you drop ACKs on the floor and never send
>> them. This requires convincing ourselves that the transition point works
>> out, notably when one side is still ACK-less and the other side is still
>> ACK-ful, but I believe it works out.
>> >>>>> >
>> >>>>> > OPTION C -- Epoch 0 always receives and acts on ACKs, but it
>> doesn't send ACKs until the version is learned.
>> >>>>> >
>> >>>>> > This is the same as above, but instead of dropping ACKs, you go
>> ahead and let that drive your state machine. But you don't send them. This
>> makes reasoning about the protocol even more complicated because there are
>> even more states you can be in w.r.t. your known version vs the state of
>> your transport. It does improve behavior around packet loss, but I think it
>> only helps this edge case in question (4) above, which is already a case
>> where servers aren't expected to send ACKs anyway.
>> >>>>> >
>> >>>>> > I think I lean towards Option A for simplicity, even though it
>> decidedly contradicts a lot of text in the RFC right now. That will be hard
>> to encode in an erratum as a few things need to change. But I also have 7
>> other eratta open against this document, so maybe it's time for rfc9147bis.
>> >>>>> >
>> >>>>> > David
>> >>>>> > _______________________________________________
>> >>>>> > TLS mailing list -- tls@ietf.org
>> >>>>> > To unsubscribe send an email to tls-le...@ietf.org
>> >>>>>
>> > _______________________________________________
>> > TLS mailing list -- tls@ietf.org
>> > To unsubscribe send an email to tls-le...@ietf.org
>>
>>
>>
>> --
>> Astra mortemque praestare gradatim
>>
> _______________________________________________
> TLS mailing list -- tls@ietf.org
> To unsubscribe send an email to tls-le...@ietf.org
>

_______________________________________________
TLS mailing list -- tls@ietf.org
To unsubscribe send an email to tls-le...@ietf.org

[TLS] Re: DTLS 1.3 ACKs near the version transition

Reply via email to