[TLS] Re: DTLS 1.3 ACKs near the version transition

David Benjamin Tue, 24 Sep 2024 07:40:11 -0700

On Tue, Sep 24, 2024 at 8:33 AM Sean Turner <s...@sn3rd.com> wrote:


> I hate to add to the pile of issues, but we also have to figure out what
> to do with the outstanding DTLS 1.2 errata as some might still impact DTLS
> 1.2; see
>
> https://www.rfc-editor.org/errata_search.php?rfc=6347&rec_status=15&presentation=table


Ah yeah. Looking through them, I think none of them apply anymore:
eid3917: Fixed in RFC 9147
eid4103: n/a to RFC 9147
eid5186: n/a to RFC 9147
eid4104: n/a to RFC 9147 (although we did mess something up around epoch
overflow:
https://mailarchive.ietf.org/arch/msg/tls/6y8wTv8Q_IPM-PCcbCAmDOYg6bM/)
eid4105: n/a to RFC 9147
eid4642: n/a to RFC 9147 (can't find discussion of 1's complement anymore)
eid5026: n/a to RFC 9147 (looks like we deleted that sentence)
eid5903: n/a to RFC 9147 (looks like we deleted that sentence)
eid8089: n/a to RFC 9147 (no HelloVerifyRequest in the first place)

Looks like that matches what you concluded here:
https://mailarchive.ietf.org/arch/msg/tls/_oYWTElq14ad83RygED2AiUI4ek/


> > On Sep 23, 2024, at 20:06, Eric Rescorla <e...@rtfm.com> wrote:
> >
> > Hi David,
> >
> > Thanks for digging in here. I haven't fully processed your comments, but
> it does seem like we probably do need a -bis. Now that we've gotten
> 8446-bis and ECH out the door, I don't think this is implausible. Do you
> feel like you are getting close to a complete list of issues to be
> addressed there?
>

I mean, I don't think anyone can ever say they're done realizing things
they hadn't thought of before. :-) But this was in the service of (my
second round of) trying to absorb the handshake reassembly changes in the
RFC in preparation for implementing this part of DTLS 1.3. I've now
switched from staring at the spec to writing code, so that round is done.
Of course, we'll realize something else in the course of writing code, I
dunno.

But I think there'll also be plenty of time in between now and the WGLC for
the as-yet-nonexistent -bis document for that.


> >
> > -Ekr
> >
> >
> >
> > On Mon, Sep 23, 2024 at 3:44 PM David Benjamin <david...@chromium.org>
> wrote:
> > For my neck of the woods, DTLS matters for WebRTC. It really should be
> QUIC, but alas it isn't and I suspect redesigning all of WebRTC now atop
> QUIC and then fully completing the transition would take much longer than
> getting to DTLS 1.3, much as the DTLS 1.3 specification needs a -bis
> document. :-)
> >
> > On Mon, Sep 23, 2024 at 6:10 PM Watson Ladd <watsonbl...@gmail.com>
> wrote:
> > Backing up a bit, at what point do we say QUIC Datagram is the right
> > way to do this?
> >
> > This whole adventure sounds like a mess.
> >
> > On Fri, Sep 20, 2024 at 8:20 AM David Benjamin <david...@chromium.org>
> wrote:
> > >
> > > (Resending since I don't see these two mails in the list archives, so
> I'm not sure if the list software broke again. Apologies if this is a
> duplicate mail!)
> > >
> > > On Thu, Sep 19, 2024 at 1:49 PM David Benjamin <david...@google.com>
> wrote:
> > >>
> > >> On Thu, Sep 19, 2024 at 1:31 PM David Benjamin <david...@google.com>
> wrote:
> > >>>
> > >>> Ah fun, another issue in this document. So not only are write epoch
> lifetimes unspecified and complex with 0-RTT, but read epoch lifetimes are
> specified but wrong.
> > >>>
> > >>> Section 4.2.1 says:
> > >>>
> > >>> > Because DTLS records could be reordered, a record from epoch M may
> be received after epoch N (where N > M) has begun. Implementations SHOULD
> discard records from earlier epochs but MAY choose to retain keying
> material from previous epochs for up to the default MSL specified for TCP
> [RFC0793] to allow for packet reordering. (Note that the intention here is
> that implementers use the current guidance from the IETF for MSL, as
> specified in [RFC0793] or successors, not that they attempt to interrogate
> the MSL that the system TCP stack is using.)
> > >>>
> > >>> https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1
> > >>>
> > >>> First, it's a bit weird to say you SHOULD discard records but MAY
> retain keying material. I assume that meant SHOULD discard records but MAY
> process records anyway up to MSL. Anyway, this model implies that only one
> read epoch is active at once, but this isn't true. You basically have to
> read epoch 1 (early data) as unordered relative to epoches 0 and 2.
> Consider a DTLS 1.3 server:
> > >>>
> > >>> 1. The server reads ClientHello with early_data extension at epoch 0
> and accepts early data.
> > >>> 2. The server sends ServerHello (epoch 0), EE..Finished (epoch 2),
> and activates write epoch 3 for half-RTT application data.
> > >>> 3. The server reads early data (epoch 1) from the client. The RFC
> would lead you to think the server can close read epoch 0 now, but...
> > >>> 4. ServerHello gets lost and, if we are to believe
> https://www.rfc-editor.org/rfc/rfc9147.html#section-7.1-8, the client
> might send an empty plaintext ACK to trigger a retransmit. This ACK will be
> at epoch 0. This only works if the server keeps read epoch 0 open!
> > >>> 5. Client eventually gets the ServerHello but now it only gets half
> of the epoch 2 data. It sends an ACK to trigger another retransmit. This
> ACK will come at epoch 2.
> > >>> 6. Server receives that ACK at epoch 2 and retransmits. The RFC
> would lead you to think the server can close read epoch 1 now, but...
> > >>> 7. Let's say that retransmit is lost again, or hasn't arrived yet.
> From the client's perspective, it has a connection that has yet to reach
> the 1-RTT point, so any data from the calling application will still be
> sent as early data. That means the client will continue to send early data
> at epoch 1. This only works if the server keeps read epoch 1 open!
> > >>> 8. The handshake progresses and the server finally gets 1-RTT data
> at epoch 3 from the client. Now the spirit of the rule in the text applies
> to epoch 1 and the server can close the epoch (after optionally waiting a
> spell for reordering)
> > >>
> > >>
> > >> Ah right, Nick Harper points out that servers really should close
> read epoch 1 [up to a delay to accommodate reordering] as soon as they
> receive the Finished message (epoch 2) and complete the handshake, not wait
> for an epoch 3 record. (But it must specifically be on handshake
> completion, not any epoch 2 record. Record-layer only logic cannot assume 1
> < 2 because 2 might contain pre-Finished ACKs.)
> > >>
> > >> All this is missing from the specification. :-) I think we need to
> rewrite the spec text on epochs to more explicitly discuss their lifetimes.
> > >>
> > >>>
> > >>> So the rule is actually that we close according to a partially
> ordered set:
> > >>> - 0 (unencrypted) < 2 (handshake) < 3 (first app data) < 4 < 5 < ...
> > >>> - 1 (early data) < 3 (first app data) < 4 < 5 < ...
> > >>> - 1 is not ordered relative to 0 and 2.
> > >>>
> > >>>
> > >>> On Wed, Sep 18, 2024 at 3:47 PM David Benjamin <david...@google.com>
> wrote:
> > >>>>
> > >>>> One more wriggle if we wish to allow unencrypted ACKs, though it is
> fixable. Section 7, says:
> > >>>>
> > >>>> > During the handshake, ACK records MUST be sent with an epoch
> which is equal to or higher than the record which is being acknowledged.
> [...] Implementations SHOULD simply use the highest current sending epoch,
> which will generally be the highest available. After the handshake,
> implementations MUST use the highest available sending epoch.
> > >>>>
> > >>>> Taken at face value, that text implies that a client sending 0-RTT
> data should send its ACKs at the highest current sending epoch, epoch 1
> (0-RTT). But if the server has rejected 0-RTT data, it will not (and
> cannot) instantiate epoch 1 at all, so it won't get the ACKs! That guidance
> needs a special case: if you would have ACKed at epoch 1, you should ACK at
> epoch 0 instead.
> > >>>>
> > >>>> Alternatively, one might interpret that situation as 0 being the
> sending epoch and 1 being some magical epoch on the side. This isn't
> supported by the document, but honestly no interpretation is supported by
> the document because the document never tells you what a "current sending
> epoch" even is. While 4.2.1 gives some rough guidance on when to close out
> receiving epochs, I could not find any text on send epoch management at
> all. Reasoning through the protocol, you might arrive at this almost
> correct rule:
> > >>>>
> > >>>> A write epoch may be discarded IF:
> > >>>> 1. It is not the highest available epoch. AND
> > >>>> 2. There are no unacked, outgoing messages at that epoch
> > >>>>
> > >>>> That rule, however, does not work in 0-RTT. If the highest epoch is
> 1, you cannot discard 0. The server might reject 0-RTT and then send
> HelloRetryRequest, at which point you will need to discard epoch 1 and
> reactivate epoch 0, maintaining continuity of sequence numbers. The
> 0-RTT/1-RTT transition is also interesting on the write side, though I'll
> start a separate thread for that.
> > >>>>
> > >>>> All this is subtle enough that it should not be left as an exercise
> to the reader.
> > >>>>
> > >>>> David
> > >>>>
> > >>>> On Wed, Sep 18, 2024 at 12:39 AM Bob Beck <b...@obtuse.com> wrote:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> > On Sep 17, 2024, at 5:28 PM, David Benjamin <davidben=
> 40google....@dmarc.ietf.org> wrote:
> > >>>>> >
> > >>>>> > Ah, I just noticed this text at the end of Section 7.1:
> > >>>>> >
> > >>>>> > > Note that in some cases it may be necessary to send an ACK
> which does not contain any record numbers. For instance, a client might
> receive an EncryptedExtensions message prior to receiving a ServerHello.
> Because it cannot decrypt the EncryptedExtensions, it cannot safely
> acknowledge it (as it might be damaged). If the client does not send an
> ACK, the server will eventually retransmit its first flight, but this might
> take far longer than the actual round trip time between client and server.
> Having the client send an empty ACK shortcuts this process.
> > >>>>> >
> > >>>>> > https://www.rfc-editor.org/rfc/rfc9147.html#section-7.1-8
> > >>>>> >
> > >>>>> > I guess then the intent is indeed that if you receive some
> random encrypted DTLS 1.3 header, even though you don't know it's DTLS 1.3
> yet, you interpret as activating the ACKing mechanism? But that seems to
> prompt more questions than it answers. For instance, what happens if you do
> that, but then finally receive the ServerHello and it turns out this was
> just some junk packet and we're really negotiation DTLS 1.2? Do you check
> that the ACK mechanism has been activated and return an error? Do you just
> pause the ACK mechanism and hope you're in an OK state? This seems quite
> prune to send the implementation into unexpected and untested states.
> > >>>>> >
> > >>>>> >
> > >>>>>
> > >>>>>
> > >>>>> Yeah, I think this has missed a nasty corner case here for
> implementations that support both.
> > >>>>>
> > >>>>> I think I also lean towards option A) (from below) here. Anyone
> else who has gotten at least their hands mildly dirty in a DTLS
> implementation that supports both 1.2 and 1.3 care to chime in as well?
> > >>>>>
> > >>>>>
> > >>>>> > On Thu, Sep 12, 2024 at 4:31 PM David Benjamin <
> david...@google.com> wrote:
> > >>>>> > Hi all,
> > >>>>> >
> > >>>>> > I noticed another issue with the DTLS 1.3 ACK design. :-)
> > >>>>> >
> > >>>>> > So, DTLS 1.3 uses ACKs. DTLS 1.2 does not use ACKs. But you only
> learn what version you're speaking partway through the lifetime of the
> connection, so there are some interesting corner cases to answer. As an
> illustrative example, I believe the diagram in section 6 is [probably]
> incorrect:
> > >>>>> > https://www.rfc-editor.org/rfc/rfc9147.html#section-6
> > >>>>> >
> > >>>>> > If the client loses the first packet, it never sees the
> ServerHello and thus learns it's speaking DTLS 1.3. While it does see the
> second packet, that packet only contains ciphertext that it cannot decrypt.
> Unless it decides to say "this looks like a 1.3 record header, therefore I
> will turn on the 1.3 state machine", which isn't supported by the RFC
> (maybe TLS 1.4 will use the same record header but redo ACKs once again),
> it shouldn't activate the 1.3 state machine yet. I expect what will
> actually happen is that the client will wait for the retransmission timeout
> a la DTLS 1.2.
> > >>>>> >
> > >>>>> > More generally, I believe these are the situations to worry
> about:
> > >>>>> >
> > >>>>> > 1. If a DTLS 1.2 (i.e. does not implement RFC 9147 at all)
> implementation receives an ACK record for whatever reason, what happens?
> This decision we don't get to change. Rather, it is a design constraint.
> Both OpenSSL and BoringSSL treat unexpected record types as a fatal error.
> I haven't checked other implementations. So I think we must take as a
> constraint that you cannot send an ACK unless you know the peer is
> 1.3-capable.
> > >>>>> >
> > >>>>> > 2. Do plaintext ACKs exist? Or is the plaintext epoch
> permanently at the old state machine? Honestly, I wish the answer here was
> "no". That would have avoided so many problems, because then epochs never
> change state machines. Unfortunately, the RFC does not support this
> interpretation. Section 4.1 talks about how to demux a plaintext ACK, and
> section 6, though wrong, clearly depicts a plaintext ACK. So instead we get
> to worry about the transition within an epoch. Keep in mind that
> transitions happen at different times on both sides. Keep in mind that
> there is a portion of the plaintext epoch that lasts after version
> negotiation in HelloRetryRequest handshakes.
> > >>>>> >
> > >>>>> > 3. If a 1.3-capable server receives half of a ClientHello, does
> it send an ACK? I believe (1) means the answer must be "no". If you haven't
> read the ClientHello, you haven't selected the version, so you don't know
> if the client is 1.3-capable or not. If the client is not 1.3-capable,
> sending an ACK may be incompatible.
> > >>>>> >
> > >>>>> > 4. Is it possible for a 1.3-capable client to receive an ACK
> before it receives a ServerHello? If so, how does the client respond? I
> believe the answer to this question, if plaintext ACKs exist, is
> unavoidably "yes". Suppose the server receives a 1.3 ClientHello and then
> negotiates DTLS 1.3. That is a complete flight, so Section 7.1 discourages
> ACKing explicitly (you can ACK implicitly), but it does not forbid an
> explicit ACK. An explicit ACK may be sent if the server cannot generate its
> responding flight immediately. That means a server could well send ACK
> followed by ServerHello. Now suppose ServerHello is lost but the ACK gets
> through. Now the client must decide what it's doing. Rejecting the ACK
> would result in connection failure, so we must either drop the ACK on the
> floor, or process it. While processing it would be more efficient (you
> don't need to retransmit the whole ClientHello), it means the plaintext
> epoch must support this hybrid state where 1.3 ACKs are processed but never
> sent! Or perhaps receiving that ACK transitions you to the 1.3 state
> machine even though you don't know the version yet. That all sounds like a
> mess, so I would advocate you simply drop it on the floor.
> > >>>>> >
> > >>>>> > 5. If a 1.3-capable client receives half of the server's first
> message (HRR or ServerHello), does it send an ACK? Again, because of (1), I
> believe the answer must be "no". If you don't know the server's selected
> version, the server may not be 1.3-capable and may not be compatible with
> the ACK.
> > >>>>> >
> > >>>>> > 6. What does a 1.3-capable server do if it receives an ACK prior
> to picking the TLS version? Unlike (4), I believe this is impossible. If
> the client has something to ACK, the server must have sent something, which
> the server will only do once it's received the full ClientHello and thus
> picked the version. However, given (4), I suspect an implementation will
> naturally just drop that ACK. In this state error vs drop is kinda academic.
> > >>>>> >
> > >>>>> > From what I can tell, RFC 9147 is silent on all of this. I think
> it should say something. I believe these are the plausible options:
> > >>>>> >
> > >>>>> > OPTION A -- There are no ACKs in epoch 0.
> > >>>>> >
> > >>>>> > We avoid this ridiculous transition point and say that ACKs only
> exist starting epoch 1. Epoch 0 uses the old DTLS 1.2 state machine. This
> is very attractive from a simplicity perspective, but since RFC 9147 was
> already published with this ambiguity, I think we need to, at minimum, say
> that DTLS 1.3 implementations drop epoch 0 ACKs on the floor. It also means
> that packet loss in HelloRetryRequest flows may be less efficient. That
> said, if your HelloRetryRequest is stateless (not applicable to all DTLS
> uses), you're probably not doing anything with ACKs anyway. Saying those
> ACKs avoids having to think about that case, at the cost of a worse
> transport for stateful HelloRetryRequest.
> > >>>>> >
> > >>>>> > OPTION B -- Epoch 0 enables ACKing once the version is learned.
> > >>>>> >
> > >>>>> > Once you know the version, you start sending and processing
> ACKs. Before you know the version, you drop ACKs on the floor and never
> send them. This requires convincing ourselves that the transition point
> works out, notably when one side is still ACK-less and the other side is
> still ACK-ful, but I believe it works out.
> > >>>>> >
> > >>>>> > OPTION C -- Epoch 0 always receives and acts on ACKs, but it
> doesn't send ACKs until the version is learned.
> > >>>>> >
> > >>>>> > This is the same as above, but instead of dropping ACKs, you go
> ahead and let that drive your state machine. But you don't send them. This
> makes reasoning about the protocol even more complicated because there are
> even more states you can be in w.r.t. your known version vs the state of
> your transport. It does improve behavior around packet loss, but I think it
> only helps this edge case in question (4) above, which is already a case
> where servers aren't expected to send ACKs anyway.
> > >>>>> >
> > >>>>> > I think I lean towards Option A for simplicity, even though it
> decidedly contradicts a lot of text in the RFC right now. That will be hard
> to encode in an erratum as a few things need to change. But I also have 7
> other eratta open against this document, so maybe it's time for rfc9147bis.
> > >>>>> >
> > >>>>> > David
> > >>>>> > _______________________________________________
> > >>>>> > TLS mailing list -- tls@ietf.org
> > >>>>> > To unsubscribe send an email to tls-le...@ietf.org
> > >>>>>
> > > _______________________________________________
> > > TLS mailing list -- tls@ietf.org
> > > To unsubscribe send an email to tls-le...@ietf.org
> >
> >
> >
> > --
> > Astra mortemque praestare gradatim
> > _______________________________________________
> > TLS mailing list -- tls@ietf.org
> > To unsubscribe send an email to tls-le...@ietf.org
> > _______________________________________________
> > TLS mailing list -- tls@ietf.org
> > To unsubscribe send an email to tls-le...@ietf.org
>
>

_______________________________________________
TLS mailing list -- tls@ietf.org
To unsubscribe send an email to tls-le...@ietf.org

[TLS] Re: DTLS 1.3 ACKs near the version transition

Reply via email to