(Resending since I don't see these two mails in the list archives, so I'm not sure if the list software broke again. Apologies if this is a duplicate mail!)
On Thu, Sep 19, 2024 at 1:49 PM David Benjamin <david...@google.com> wrote: > On Thu, Sep 19, 2024 at 1:31 PM David Benjamin <david...@google.com> > wrote: > >> Ah fun, another issue in this document. So not only are write epoch >> lifetimes unspecified and complex with 0-RTT, but read epoch lifetimes >> *are* specified but *wrong*. >> >> Section 4.2.1 says: >> >> > Because DTLS records could be reordered, a record from epoch M may be >> received after epoch N (where N > M) has begun. Implementations SHOULD >> discard records from earlier epochs but MAY choose to retain keying >> material from previous epochs for up to the default MSL specified for TCP >> [RFC0793] to allow for packet reordering. (Note that the intention here is >> that implementers use the current guidance from the IETF for MSL, as >> specified in [RFC0793] or successors, not that they attempt to interrogate >> the MSL that the system TCP stack is using.) >> >> https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1 >> >> First, it's a bit weird to say you SHOULD discard *records* but MAY >> retain *keying material*. I assume that meant SHOULD discard records but >> MAY process records anyway up to MSL. Anyway, this model implies that only >> one read epoch is active at once, but this isn't true. You basically have >> to read epoch 1 (early data) as unordered relative to epoches 0 and 2. >> Consider a DTLS 1.3 server: >> >> 1. The server reads ClientHello with early_data extension at epoch 0 and >> accepts early data. >> 2. The server sends ServerHello (epoch 0), EE..Finished (epoch 2), and >> activates write epoch 3 for half-RTT application data. >> 3. The server reads early data (epoch 1) from the client. The RFC would >> lead you to think the server can close read epoch 0 now, but... >> 4. ServerHello gets lost and, if we are to believe >> https://www.rfc-editor.org/rfc/rfc9147.html#section-7.1-8, the client >> might send an empty plaintext ACK to trigger a retransmit. This ACK will be >> at epoch 0. This only works if the server keeps read epoch 0 open! >> 5. Client eventually gets the ServerHello but now it only gets half of >> the epoch 2 data. It sends an ACK to trigger another retransmit. This ACK >> will come at epoch 2. >> 6. Server receives that ACK at epoch 2 and retransmits. The RFC would >> lead you to think the server can close read epoch 1 now, but... >> 7. Let's say that retransmit is lost again, or hasn't arrived yet. From >> the client's perspective, it has a connection that has yet to reach the >> 1-RTT point, so any data from the calling application will still be sent as >> early data. That means the client will continue to send early data at epoch >> 1. This only works if the server keeps read epoch 1 open! >> 8. The handshake progresses and the server finally gets 1-RTT data at >> epoch 3 from the client. *Now* the spirit of the rule in the text >> applies to epoch 1 and the server can close the epoch (after optionally >> waiting a spell for reordering) >> > > Ah right, Nick Harper points out that servers really should close read > epoch 1 [up to a delay to accommodate reordering] as soon as they receive > the Finished message (epoch 2) and complete the handshake, not wait for an > epoch 3 record. (But it must specifically be on handshake completion, not > *any* epoch 2 record. Record-layer only logic cannot assume 1 < 2 because > 2 might contain pre-Finished ACKs.) > > All this is missing from the specification. :-) I think we need to rewrite > the spec text on epochs to more explicitly discuss their lifetimes. > > >> So the rule is actually that we close according to a partially ordered >> set: >> - 0 (unencrypted) < 2 (handshake) < 3 (first app data) < 4 < 5 < ... >> - 1 (early data) < 3 (first app data) < 4 < 5 < ... >> - 1 is not ordered relative to 0 and 2. >> >> >> On Wed, Sep 18, 2024 at 3:47 PM David Benjamin <david...@google.com> >> wrote: >> >>> One more wriggle if we wish to allow unencrypted ACKs, though it is >>> fixable. Section 7, says: >>> >>> > During the handshake, ACK records MUST be sent with an epoch which is >>> equal to or higher than the record which is being acknowledged. [...] >>> Implementations SHOULD simply use the highest current sending epoch, which >>> will generally be the highest available. After the handshake, >>> implementations MUST use the highest available sending epoch. >>> >>> Taken at face value, that text implies that a client sending 0-RTT data >>> should send its ACKs at the highest current sending epoch, epoch 1 (0-RTT). >>> But if the server has rejected 0-RTT data, it will not (and cannot) >>> instantiate epoch 1 at all, so it won't get the ACKs! That guidance needs a >>> special case: if you would have ACKed at epoch 1, you should ACK at epoch 0 >>> instead. >>> >>> Alternatively, one might interpret that situation as 0 being the sending >>> epoch and 1 being some magical epoch on the side. This isn't supported by >>> the document, but honestly no interpretation is supported by the document >>> because the document never tells you what a "current sending epoch" even >>> is. While 4.2.1 gives some rough guidance on when to close out receiving >>> epochs, I could not find any text on send epoch management at all. >>> Reasoning through the protocol, you might arrive at this *almost* correct >>> rule: >>> >>> A write epoch may be discarded IF: >>> 1. It is not the highest available epoch. AND >>> 2. There are no unacked, outgoing messages at that epoch >>> >>> That rule, however, does not work in 0-RTT. If the highest epoch is 1, >>> you cannot discard 0. The server might reject 0-RTT and then send >>> HelloRetryRequest, at which point you will need to discard epoch 1 and >>> reactivate epoch 0, maintaining continuity of sequence numbers. The >>> 0-RTT/1-RTT transition is also interesting on the write side, though I'll >>> start a separate thread for that. >>> >>> All this is subtle enough that it should not be left as an exercise to >>> the reader. >>> >>> David >>> >>> On Wed, Sep 18, 2024 at 12:39 AM Bob Beck <b...@obtuse.com> wrote: >>> >>>> >>>> >>>> > On Sep 17, 2024, at 5:28 PM, David Benjamin <davidben= >>>> 40google....@dmarc.ietf.org> wrote: >>>> > >>>> > Ah, I just noticed this text at the end of Section 7.1: >>>> > >>>> > > Note that in some cases it may be necessary to send an ACK which >>>> does not contain any record numbers. For instance, a client might receive >>>> an EncryptedExtensions message prior to receiving a ServerHello. Because it >>>> cannot decrypt the EncryptedExtensions, it cannot safely acknowledge it (as >>>> it might be damaged). If the client does not send an ACK, the server will >>>> eventually retransmit its first flight, but this might take far longer than >>>> the actual round trip time between client and server. Having the client >>>> send an empty ACK shortcuts this process. >>>> > >>>> > https://www.rfc-editor.org/rfc/rfc9147.html#section-7.1-8 >>>> > >>>> > I guess then the intent is indeed that if you receive some random >>>> encrypted DTLS 1.3 header, even though you don't know it's DTLS 1.3 yet, >>>> you interpret as activating the ACKing mechanism? But that seems to prompt >>>> more questions than it answers. For instance, what happens if you do that, >>>> but then finally receive the ServerHello and it turns out this was just >>>> some junk packet and we're really negotiation DTLS 1.2? Do you check that >>>> the ACK mechanism has been activated and return an error? Do you just pause >>>> the ACK mechanism and hope you're in an OK state? This seems quite prune to >>>> send the implementation into unexpected and untested states. >>>> > >>>> > >>>> >>>> >>>> Yeah, I think this has missed a nasty corner case here for >>>> implementations that support both. >>>> >>>> I think I also lean towards option A) (from below) here. Anyone else >>>> who has gotten at least their hands mildly dirty in a DTLS implementation >>>> that supports both 1.2 and 1.3 care to chime in as well? >>>> >>>> >>>> > On Thu, Sep 12, 2024 at 4:31 PM David Benjamin <david...@google.com> >>>> wrote: >>>> > Hi all, >>>> > >>>> > I noticed another issue with the DTLS 1.3 ACK design. :-) >>>> > >>>> > So, DTLS 1.3 uses ACKs. DTLS 1.2 does not use ACKs. But you only >>>> learn what version you're speaking partway through the lifetime of the >>>> connection, so there are some interesting corner cases to answer. As an >>>> illustrative example, I believe the diagram in section 6 is [probably] >>>> incorrect: >>>> > https://www.rfc-editor.org/rfc/rfc9147.html#section-6 >>>> > >>>> > If the client loses the first packet, it never sees the ServerHello >>>> and thus learns it's speaking DTLS 1.3. While it does see the second >>>> packet, that packet only contains ciphertext that it cannot decrypt. Unless >>>> it decides to say "this looks like a 1.3 record header, therefore I will >>>> turn on the 1.3 state machine", which isn't supported by the RFC (maybe TLS >>>> 1.4 will use the same record header but redo ACKs once again), it shouldn't >>>> activate the 1.3 state machine yet. I expect what will actually happen is >>>> that the client will wait for the retransmission timeout a la DTLS 1.2. >>>> > >>>> > More generally, I believe these are the situations to worry about: >>>> > >>>> > 1. If a DTLS 1.2 (i.e. does not implement RFC 9147 at all) >>>> implementation receives an ACK record for whatever reason, what happens? >>>> This decision we don't get to change. Rather, it is a design constraint. >>>> Both OpenSSL and BoringSSL treat unexpected record types as a fatal error. >>>> I haven't checked other implementations. So I think we must take as a >>>> constraint that you cannot send an ACK unless you know the peer is >>>> 1.3-capable. >>>> > >>>> > 2. Do plaintext ACKs exist? Or is the plaintext epoch permanently at >>>> the old state machine? Honestly, I wish the answer here was "no". That >>>> would have avoided so many problems, because then epochs never change state >>>> machines. Unfortunately, the RFC does not support this interpretation. >>>> Section 4.1 talks about how to demux a plaintext ACK, and section 6, though >>>> wrong, clearly depicts a plaintext ACK. So instead we get to worry about >>>> the transition within an epoch. Keep in mind that transitions happen at >>>> different times on both sides. Keep in mind that there is a portion of the >>>> plaintext epoch that lasts after version negotiation in HelloRetryRequest >>>> handshakes. >>>> > >>>> > 3. If a 1.3-capable server receives half of a ClientHello, does it >>>> send an ACK? I believe (1) means the answer must be "no". If you haven't >>>> read the ClientHello, you haven't selected the version, so you don't know >>>> if the client is 1.3-capable or not. If the client is not 1.3-capable, >>>> sending an ACK may be incompatible. >>>> > >>>> > 4. Is it possible for a 1.3-capable client to receive an ACK before >>>> it receives a ServerHello? If so, how does the client respond? I believe >>>> the answer to this question, if plaintext ACKs exist, is unavoidably "yes". >>>> Suppose the server receives a 1.3 ClientHello and then negotiates DTLS 1.3. >>>> That is a complete flight, so Section 7.1 discourages ACKing explicitly >>>> (you can ACK implicitly), but it does not forbid an explicit ACK. An >>>> explicit ACK may be sent if the server cannot generate its responding >>>> flight immediately. That means a server could well send ACK followed by >>>> ServerHello. Now suppose ServerHello is lost but the ACK gets through. Now >>>> the client must decide what it's doing. Rejecting the ACK would result in >>>> connection failure, so we must either drop the ACK on the floor, or process >>>> it. While processing it would be more efficient (you don't need to >>>> retransmit the whole ClientHello), it means the plaintext epoch must >>>> support this hybrid state where 1.3 ACKs are processed but never sent! Or >>>> perhaps receiving that ACK transitions you to the 1.3 state machine even >>>> though you don't know the version yet. That all sounds like a mess, so I >>>> would advocate you simply drop it on the floor. >>>> > >>>> > 5. If a 1.3-capable client receives half of the server's first >>>> message (HRR or ServerHello), does it send an ACK? Again, because of (1), I >>>> believe the answer must be "no". If you don't know the server's selected >>>> version, the server may not be 1.3-capable and may not be compatible with >>>> the ACK. >>>> > >>>> > 6. What does a 1.3-capable server do if it receives an ACK prior to >>>> picking the TLS version? Unlike (4), I believe this is impossible. If the >>>> client has something to ACK, the server must have sent something, which the >>>> server will only do once it's received the full ClientHello and thus picked >>>> the version. However, given (4), I suspect an implementation will naturally >>>> just drop that ACK. In this state error vs drop is kinda academic. >>>> > >>>> > From what I can tell, RFC 9147 is silent on all of this. I think it >>>> should say something. I believe these are the plausible options: >>>> > >>>> > OPTION A -- There are no ACKs in epoch 0. >>>> > >>>> > We avoid this ridiculous transition point and say that ACKs only >>>> exist starting epoch 1. Epoch 0 uses the old DTLS 1.2 state machine. This >>>> is very attractive from a simplicity perspective, but since RFC 9147 was >>>> already published with this ambiguity, I think we need to, at minimum, say >>>> that DTLS 1.3 implementations drop epoch 0 ACKs on the floor. It also means >>>> that packet loss in HelloRetryRequest flows may be less efficient. That >>>> said, if your HelloRetryRequest is stateless (not applicable to all DTLS >>>> uses), you're probably not doing anything with ACKs anyway. Saying those >>>> ACKs avoids having to think about that case, at the cost of a worse >>>> transport for stateful HelloRetryRequest. >>>> > >>>> > OPTION B -- Epoch 0 enables ACKing once the version is learned. >>>> > >>>> > Once you know the version, you start sending and processing ACKs. >>>> Before you know the version, you drop ACKs on the floor and never send >>>> them. This requires convincing ourselves that the transition point works >>>> out, notably when one side is still ACK-less and the other side is still >>>> ACK-ful, but I believe it works out. >>>> > >>>> > OPTION C -- Epoch 0 always receives and acts on ACKs, but it doesn't >>>> send ACKs until the version is learned. >>>> > >>>> > This is the same as above, but instead of dropping ACKs, you go ahead >>>> and let that drive your state machine. But you don't send them. This makes >>>> reasoning about the protocol even more complicated because there are even >>>> more states you can be in w.r.t. your known version vs the state of your >>>> transport. It does improve behavior around packet loss, but I think it only >>>> helps this edge case in question (4) above, which is already a case where >>>> servers aren't expected to send ACKs anyway. >>>> > >>>> > I think I lean towards Option A for simplicity, even though it >>>> decidedly contradicts a lot of text in the RFC right now. That will be hard >>>> to encode in an erratum as a few things need to change. But I also have 7 >>>> other eratta open against this document, so maybe it's time for rfc9147bis. >>>> > >>>> > David >>>> > _______________________________________________ >>>> > TLS mailing list -- tls@ietf.org >>>> > To unsubscribe send an email to tls-le...@ietf.org >>>> >>>>
_______________________________________________ TLS mailing list -- tls@ietf.org To unsubscribe send an email to tls-le...@ietf.org