[TLS] Re: [EXTERNAL] Re: DTLS 1.3 bis

David Benjamin Sun, 17 Nov 2024 05:01:34 -0800

On Sat, Nov 16, 2024 at 10:40 AM Ilari Liusvaara <ilariliusva...@welho.com>
wrote:

> On Wed, Nov 13, 2024 at 01:39:43PM -0500, David Benjamin wrote:
> >
> > Not to say that every implementor would have noticed every issue (I'm
> sure
> > I overlooked some issues too), but I think DTLS's biggest challenge has
> > always been the relatively little attention it receives compared to TLS.
>
> - When can the server drop epoch 2 (handshake) receive keys?
>
> Suppose client 2nd flight makes it through, but the ACK is lost. This
> causes the client to re-transmit the flight. The re-transmission happens
> with epoch 2. So the server needs epoch 2 receive keys in order to ACK
> the re-transmit. And this ACK could get lost as well.
>
> So the server needs to keep epoch 2 receive keys until client considers
> its 2nd flight complete. However, the server does not seem to have means
> to determine when this has happened.
>
> If the server did not send CertificateRequest, then NewSessionTicket is
> unordered w.r.t. client 2nd flight. And even if the server did send CR,
> then NST is not considered implicit ACK for client 2nd flight.
>
> Is there some prohibition on client sending post-handshake messages
> before considering handshake complete? If no, one can't use PS messges
> as an indicator, and the client might not send PS messages anyway.
>

Aww, yuck! Well, that proves my parenthetical. I'd missed that one.

I mean, the spec does have an answer, but it's incredibly unsatisfying,
because it's based on time rather than packet loss.

> In addition, for at least twice the default MSL defined for [RFC0793],
when in the FINISHED state, the server MUST respond to retransmission of
the client's final flight with a retransmit of its ACK.
https://www.rfc-editor.org/rfc/rfc9147.html#section-5.8.1-9

In particular, this means my "best guess" in slide 8 here is not sufficient
and you actually are *required* to carry a past read epoch, in just one
case:
https://datatracker.ietf.org/meeting/121/materials/slides-121-tls-13-dtls-13-details-00

That means this text here is wrong, because it suggests this is optional:
> Implementations SHOULD discard records from earlier epochs but MAY choose
to retain keying material from previous epochs for up to the default MSL
specified for TCP [RFC0793] to allow for packet reordering.
https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1-1

Another problem with it being based on time is that we might have moved
arbitrarily far in the connection by then. Maybe the RTT is suuuper fast
and actually we've done KeyUpdate 100x by then. By a strict reading of that
text, you're still obligated to retain epoch 2, even though you're on epoch
100. Now you need to retain epochs arbitrarily far apart!

Fortunately, this is actually impossible because if the server ACKs
KeyUpdate, the client should know the server has received its final flight.
But there is nothing in the spec that says "if you receive an ACK for a
message in epoch N, everything epochs < N ACKed". (Note that < here is
evaluated according to our partially-ordered set because epoch 1 is weird.
Though epoch 1 should contain no handshake messages, so it's kinda moot.)
Moreover, if we apply the fix to KeyUpdate, the client will not start
sending at the new epoch until the final flight is ACKed too and everything
is caught up. But, impossible as all this is, the spec text does not
account for it being impossible.

We also have a near miss for even more complexity. Suppose the handshake's
final flight actually spanned both epochs 0 and 2 instead of just 2. The
server receives both but, for whatever reason, the ACK for epoch 0 didn't
get through. (Re-ACKing past records is optional in the spec. Also things
might fall out of bounded ACK buffers eventually.) The client will not
consider the final flight to be ACKed until *all* records are through,
which means the server would need to retain epoch 0. Fortunately, the final
flight doesn't look like this and we don't need to worry about it. Though
it's further evidence that we should add the implicit ACK condition above.

There's a related, but less crucial, problem with the server's final
flight. At what point can the client discard read epoch 2? Consider:

CH -->
<-- SH {EE..Finished}
<-- [0.5-RTT App Data]
{Finished} -/-> (lost)

Now, the client will retransmit Finished and eventually repair this, but
the server has a retransmit timer too. Since the server can't tell which
side was lost, it will retransmit SH {EE..Finished}. The client is expected
to use that to drive retransmitting Finished:

> 3. The implementation reads a retransmitted flight from the peer when
none of the messages that it sent in response to that flight have been
acknowledged: the implementation transitions to the SENDING state, where it
retransmits the flight, adjusts and re-arms the retransmit timer, and
returns to the WAITING state. The rationale here is that the receipt of a
duplicate message is the likely result of timer expiry on the peer and
therefore suggests that part of one's previous flight was lost.
https://www.rfc-editor.org/rfc/rfc9147.html#section-5.8.1-7.3

However, that only works if the client has retained read epoch 2. But the
client has already received something from read epoch 3, and it would be
nice if KeyUpdate's "once you see read epoch N, you know the peer has moved
on from read epoch N-1" rule could generalize. Now, this isn't *that*
important because the protocol still functions if you don't implement this
rule. The client should have retained its own timer, and a priori the
server's not going to have any better of an estimate for that timer than
the client here. (Probably worse because the server won't have had an
opportunity for an RTT estimate yet, if we didn't HRR.)

A thought: This is now a protocol change, but what if we defined a "oops"
extension that simply adds a dummy post-Finished handshake message that
protrudes into epoch 3? I.e., if negotiated, the client and server flights
actually look like this:

CH -->
<-- SH {EE..Finished} [Oops]
{Finished} [Oops] -->
<-- [ACK]

I think if you combine that with the "ACKing epoch 3 implicitly ACKs all of
epoch 2" rule, this problem might be resolved? All retransmits by the
client are now guaranteed to contain at least one byte of Oops, because a
fully-acked Oops implies an acked Finished. That means the server need only
retain epoch 3, because as long as it can ACK the Oops, the client will get
the message.

Note that implicit ACKing rule only works if we interpret this text
somewhat strictly:
> Note that because of packet loss, it is possible for one side to be
sending application data even though the other side has not received the
first side's Finished message. Implementations MUST either discard or
buffer all application data records for epoch 3 and above until they have
received the Finished message from the peer.
https://www.rfc-editor.org/rfc/rfc9147.html#section-5.8.1-10

This buffering must happen *before* the acking logic. In the server->client
direction, this is natural because the client cannot even compute epoch 3
keys before it's gotten to the end of EE..Finished. In the client->server
direction, a server could conceivably prepare epoch 3 early and then queue
up Oops in the message buffer and ACK it. That would break this rule. But I
think saying that the handshake should not release epoch 3 keys Finished is
nice and symmetric, so I don't think that's a huge concern.

(It is a little interesting that KeyUpdate does not benefit from this Oops
pattern. I think this is because KeyUpdate key changes work completely
differently from handshake key changes. KeyUpdate drives key change on the
ACK instead of send. But if we did that in the handshake we'd add a bunch
of round-trips. Whereas we don't care as much if KeyUpdates are delayed.
Although this does make the KeyUpdate API subtly different between TLS and
DTLS. If you KeyUpdate in TLS, you know that you are *immediately* done
writing with that key. We could have defined KeyUpdate differently, but
then we get head-of-line blocking problems.)

> - Single epoch, multiple prepare for next epoch messages.
>
> What does it mean for a single epoch to contain multiple messages that
> prepare for the next epoch? Does that prepare one epoch or multiple
> epochs? Doing multiple might cause issues with epoch reconstruction.
>
> AFAICT, sending multiple KeyUpdates in one epoch is not forbidden (the
> spec requires ACK, not actual epoch bump in between).
>

I believe it's forbidden by this text. But I suspect this was on accident
because it's not just to facilitate epoch reconstruction:

> In order to facilitate epoch reconstruction (Section 4.2.2),
implementations MUST NOT send records with the new keys or send a new
KeyUpdate until the previous KeyUpdate has been acknowledged (this avoids
having too many epochs in active use).
https://www.rfc-editor.org/rfc/rfc9147.html#section-8-1

In my attempt to fix the other KeyUpdate brokenness, I said that nothing
may follow a KeyUpdate in that epoch, which I think captures this a bit
more directly. I think that works? Except...
https://www.rfc-editor.org/errata/eid8047

> And in future extensions, there might be more message types that prepare
> epoch bumps (e.g., some Extended Key Update messages). The interactions
> between those (and regular KeyUpdate) might not be simple.
>
> I think there should be requirement that in each epoch, there is at
> most one message that prepares for the next epoch, and that all
> application data epoch except the last have exactly one.
>
> And with restriction that retransmissions must occur on the same epoch
> (why is that there for post-handshake messages?), the message that
> prepares for the next epoch must always be the last in its flight.
>

Extended Key Update is potentially extra fun because it's a multi-flight
transaction. What happens if you start an EKU flow but then, partway
through it, the peer sends a plain KeyUpdate? What if one side starts an
EKU flow and, at the same time, the other side sends KeyUpdate with
key_update_requested? Will the EKU-sending peer know not to confuse itself?
Or maybe we can design EKU such that it still works out, because the next
epoch hasn't been prepared yet? EKU doesn't exist yet, but something we'll
have to reason through when we get there.

> > This is exacerbated by the kinds of things we need attention on. While
> the
> > security, cryptography, and handshake bits (this WG's forte),
> more-or-less
> > carry over as-is, it picks up a whole mess of transport-related concerns
> > that just don't apply to TLS.
>
> I remember that when developing HTTP/2, the HTTP WG people had a joint
> session with Transport Area folks about transport aspects of the
> protocol. HTTP/2 does not need to deal with loss or reordering.
>
>
> > And then there's also a wide range of possible implementations, depending
> > on the simplifying assumptions you make (e.g. refusing to have multiple
> > outgoing post-handshake flights active at once). That, in turn, means
> that
> > a reader might not have bothered thinking about the more complex case, if
> > they didn't mean to implement it. (On my end, I don't expect we'll
> > implement everything in here either!)
>
> And even if some case is implemented, it might still be subtly broken
> (or completely broken if it is not actually used).
>
> E.g., Long time ago, I saw a TLS client with totally broken KeyUpdate
> handling. And this was MTI feature.
>
> Or subtle issues in ACK implementation exposing endpoint to DoS or
> being abused as an amplifier.
>
>
> > While I think this WG's analysis (formal and otherwises) are mostly on
> > security properties, the issues I found are mostly making sure the
> protocol
> > can make forward progress under packet loss/reordering. But also whether
> > the text sufficiently defines the protocol at all. For example, it's
> quite
> > common for DTLS implementations to take these simplifying assumptions,
> but
> > all that actually needs to be written down as allowed behavior, because
> it
> > means that, when we analyze the protocol, receivers must accommodate a
> > sender that, say, artificially block sending one flight on the ACK of
> > another one.
>
> And then there is stuff that works only because senders are being
> conservative, not because protocol requires it.
>
> E.g., server that fragments from multiple messages at once (I don't see
> anything prohibiting that), and client that does not implement full out-
> of-order receive buffering (I don't see anything requiring that).
> Guaranteed deadlock in handshake, even with zero packet loss or
> reordering.
>
>
> > The remedy for all that is, well, more eyes on it, which we get by having
> > the WG take on a bis document. :-) Beyond that, whether we need
> > implementers, formal analysis, or just people reading and reasoning
> through
> > the draft, I think we just welcome anyone who is interested in doing that
> > work and go forth. All three sources of feedback ultimately involve a
> human
> > reading the document and trying to understand what it's trying to say
> > anyway, which I think is the biggest gap here. Once we even know what our
> > protocol is, if there are folks available to model that and formally
> check
> > a forward-progress property (rather than a security property), awesome!
> If
> > not, we can resort to traditional "think very hard about it" techniques.
> I
> > don't think we need to preemptively worry about which options we'll have
> > available when we get there.
>
> There is also semi-formal reasoning. While not even close to actual
> formal check, it is big step up from just thinking very hard about it.
>
> And then, then there are possible future extensions to TLS that interact
> with this somehow. For example, a new message that updates keys.
>
>
>
>
> -Ilari
>
> _______________________________________________
> TLS mailing list -- tls@ietf.org
> To unsubscribe send an email to tls-le...@ietf.org
>

_______________________________________________
TLS mailing list -- tls@ietf.org
To unsubscribe send an email to tls-le...@ietf.org

[TLS] Re: [EXTERNAL] Re: DTLS 1.3 bis

Reply via email to