On Sun, Nov 3, 2024 at 3:50 PM Ilari Liusvaara <ilariliusva...@welho.com>
wrote:

> On Sun, Nov 03, 2024 at 12:49:59PM +0000, David Benjamin wrote:
> > Hi all,
> >
> > So, Section 7 says the ACK contains:
> > > A list of the records containing handshake messages in the current
> flight
> > which the endpoint has received and either processed or buffered, in
> > numerically increasing order.
> > https://www.rfc-editor.org/rfc/rfc9147.html#name-ack-message
> >
> > First, it is ambiguous what "numerically increasing order" means when
> there
> > are two integers in a packet number, not one.
>
> I would interpret "numerically increasing order" to mean primarily
> sorted by epoch, secondarily by record sequence number.
>
> However, I do not think there are any other flights that should span
> epochs besides the ServerHello-ServerFinished one. But I think this is
> one of those reasonable-but-not-required-by-spec things.
>
> And that flight seems pretty special in terms of ACKing. For example,
> any epoch 2+ ACK implicitly ACKs complete ServerHello message (it is
> impossible to enter epoch 2+ without it). And one should be very
> careful about epoch 0 (unencrypted) ACKs.
>
> (The DTLS 1.3 spec allows all sorts of stuff that seems pretty
> unreasonable, like fragment sequence in a record jumping backwards.)
>

The spec also recommends you keep your older epochs around for a spell in
case of packet reordering. That can also cause you to see the older epoch
even after the handshake has progressed past it. I'm not sure if there's
any benefit to doing this specifically during the handshake, although it
might let you see an older ACK. Seeing that older ACK may be unnecessary if
you do the epoch-aware implicit ACK you describe, but neither that nor
epoch management in general is described in the document. (I think the
very, very badly needed rfc9147bis should fix the latter at least. Adding
your extra implicit ACK case seems reasonable too.)


> > In particular, it seems a natural implementation will result in receive
> > order, not numerical order. Implementations should bound their ACK
> buffers
> > to avoid DoS, and are expected to preferentially ACK more recent records:
> >
> > > Implementations MAY acknowledge the records corresponding to each
> > transmission of each flight or simply acknowledge the most recent one. In
> > general, implementations SHOULD ACK as many received packets as can fit
> > into the ACK record, as this provides the most complete information and
> > thus reduces the chance of spurious retransmission; if space is limited,
> > implementations SHOULD favor including records which have not yet been
> > acknowledged.
>
> I think it is important to priorize acking highest record numbers,
> because senders should bound outstanding record buffers to avoid DoS,
> those entries are required to handle ACK, and there is no backup whole-
> message ack (with exception of ServerHello).
>
> There is subtle edge case where this can cause outright failures:
>
> - Sender that implements only linear tracking.
> - Very large flight that gets split into lots of records.
> - Some of those records get evicted from ACK buffer before being ACKed.
> - Flight has no response, or response is blocked.
>
> In this scenario, no data will get through, and the sender will just
> re-transmit the flight forever.
>
> To counter this, if ACK buffer fills with unacknowledged records, one
> should immediately send ACK. If the first record in transmission was
> received and that ACK makes it through, it will cause forward progress.
>

I'm not sure that will actually prevent forward progress, though I may be
misunderstanding your example. In the worst case, you will manage to ACK,
say, the last 32K of that flight. The peer will then retransmit all but the
last 32K, you'll ACK the last 32K of that, and so on until the whole flight
gets ACKed. This is not amazing, but it's still forward progress. And given
each record number covers about an MTU's worth of handshake data, you don't
need much ACK buffer to avoid this or make its effects minimal. Though I
agree flushing the ACK buffer when full is a sensible implementation
strategy (though also not mentioned by the specification).

And when the response is merely blocked, at some point one hopes you will
manage to generate that response, otherwise forward progress was impossible
anyway for other reasons! :-)


> > Given that, the natural implementation is some kind of bounded MRU queue
> of
> > records, where old ones fall off the end. (I'm planning to use a ring
> > buffer for our implementation.) To get numerical order, you'd need to
> > re-sort when sending an ACK. That is not hard, but it's unclear to me
> > what's the point.
>
> Above is one case where one wants last records sent (highest RSN), not
> last records received.
>

I'm not sure I follow. In that example, there are more unACKed records than
fit in the buffer at all, so neither eviction algorithm will ACK
everything. I'm not seeing how prioritizing the highest RSN improves
things. More generally, the last records received are the one that you
haven't ACKed yet, so when there aren't eviction problems, those are the
ones to prioritize.


> > Next, the spec's guidance on when to clear the ACK buffer seems odd to
> me.
> > Section 7 also says:
> >
> > > During the handshake, ACKs only cover the current outstanding flight
> > (this is possible because DTLS is generally a lock-step protocol). In
> > particular, receiving a message from a handshake flight implicitly
> > acknowledges all messages from the previous flight(s). Accordingly, an
> ACK
> > from the server would not cover both the ClientHello and the client's
> > Certificate message, because the ClientHello and client Certificate are
> in
> > different flights. Implementations can accomplish this by clearing their
> > ACK list upon receiving the start of the next flight.
>
> One thing to note: While DTLS is usually lock-step protocol, there are
> post-handshake messages that are not lock-step.
>
> If flight has a reply, then that reply starting will implicitly ACK the
> flight. However, crossing flight may block that response. In that case,
> the flight must be ACKed to avoid a deadlock.
>
>
> > The claim that clearing this ACK list accomplishes this is not true, for
> > several reasons:
> >
> > First, there's nothing stopping you from receiving a (redundant) portion
> of
> > the previous flight while you're receiving the new one. You'll notice all
> > the sequence numbers are old and ignore them when processing, but that
> > still keeps the record eligible for an ACK. Moreover, it's still
> important
> > ACK to old fragments. When the old fragment is in the *current* flight,
> the
> > peer may have lost an earlier ACK and not realize they can stop
> > retransmitting. It's only old fragments in *previous* flights that are
> > unnecessary to ACK, but the specification does not suggest to distinguish
> > them. (Distinguishing them would require extra state in the record layer
> to
> > store a low watermark for the flight, and that seems a waste. There's no
> > real harm in adding that record to the ACK buffer.)
>
> There is in fact another subtle edge case which requires ACKing stuff
> from previous flight:
>
> - Sender sends flight that has no response or response is blocked.
> - The complete flight comes through, but the last ACK is lost.
> - Sender re-transmits the (possibly partial) flight.
>
> The receiver considers flight complete, but sender does not. Getting
> things unstuck requires ACKing stuff from previous flight.
>
> However, this does not require keeping the records from previous flight
> in the list.
>

Well, there's two ways to get it unstuck. You could either explicitly ACK
the previous flight, or just start sending your reply, which implicitly
ACKs that flight. If you've received the complete flight, you're presumably
ready to send the response. (Except in the case where generating that
flight is slow, in which case the draft recommends you send an ACK, yeah.)

I took another look at the ACK-sending guidance and I'll amend my comment
slightly. I said:

> When the old fragment is in the *current* flight, the
> peer may have lost an earlier ACK and not realize they can stop
> retransmitting. It's only old fragments in *previous* flights that are
> unnecessary to ACK, but the specification does not suggest to distinguish
> them. (Distinguishing them would require extra state in the record layer
to
> store a low watermark for the flight, and that seems a waste. There's no
> real harm in adding that record to the ACK buffer.)

The specification *does* suggest distinguishing them, but in a roundabout
way. Section 7.1 says (emphasis mine):

> When an implementation detects a disruption *in the receipt of the
current incoming flight*, it SHOULD generate an ACK that covers the
messages from *that flight* which it has received and processed so far.

That text suggests that you should only be ACKing the current incoming
flight, which in turn suggests you shouldn't ACK flights from previous
incoming flights, which in turn means tracking the boundary of the current
and previous incoming flight. That has one attractive property. During the
handshake (more on post-handshake later), if you also fix the
buffer-clearing behavior, it means the ACK-send timer and retransmit timer
are never both active. If you've received all of flight N-1, sent N, but
received none of N+1, the ACK buffer has been cleared and there's no ACK
timer from N+1. If you've received part of N+1, that implicitly ACKs N so
you start the ACK timer and shut off the transmit timer.

But all this breaks down with post-handshake messages, so this is not
actually a useful invariant for your implementation. Moreover, how to
interpret this 7.1 text with post-handshake messages is a little
interesting. Section 5.8.4 just says you "duplicate" the state machine. But
if we have a bunch multi-flight post-handshake transactions (right now
post-handshake auth is the only one), we shouldn't have a design that asks
the implementation track, indefinitely, which sequence numbers were part of
a current vs past incoming flight, because that means memory usage grows
indefinitely with each post-handshake transaction.

And so I'm still back around to a similar ACK policy being a more sensible
baseline:
- If you get a record with no unprocessed (i.e. too far in the
future---past is OK) fragments, add it to the ACK buffer and start the
ACK-send timer if not already started
- Don't worry too much about whether the record contained stuff from the
current or previous flight, worst case you ACK something the peer isn't
paying attention to anymore
- Pick your favorite policy for ACK buffer overflow (I still favor
prioritizing by recency)
- When you start sending (i.e. aren't blocked on processing) a handshake
flight, clear the ACK buffer because you're implicitly ACKing it
- When the ACK timer fires, send the ACK buffer and stop the timer
- When you complete a handshake transaction where the peer spoke last, send
an ACK straight away and stop the ACK-send timer

> If the peer sent flight N-1, you sent N, and now you're in the middle of
> > receiving flight N+1, you can stop ACKing flight N-1 as soon as you start
> > *sending* N. You don't need to wait to receive N+1. *Every* fragment of N
> > implicitly ACKs all of N-1, so as soon as you're ready to send any part
> of
> > N, you may as well send that instead of ACKing individual records because
> > then you also make progress in the connection. The spec instead says to
> > wait until receiving part of N+1, which seems later than needed and may
> not
> > even exist.
>
> Yes, if one starts transmitting reply flight, one should immedately stop
> sending ACKs for the previous flight. The reply flight will implcitly
> ACK the complete previous flight.
>
> However, if the reply is blocked (which can happen with non-lockstep
> post-handshake messages), one needs to continue transmitting ACKs until
> unblocked in order to avoid possible deadlock (the other side might be
> blocked as well!).
>

Agreed. Yeah, if you haven't gotten to the sending state with the reply
flight, this doesn't apply and you should keep the ACK process going,


> > (Neither version achieves the stated goal in the spec. The stated goal
> > seems to require tracking extra state.)
> >
> > With all that said, it seems odd to be clearing the ACK buffer at all.
> I've
> > gathered the reason to ACK by record number instead of message ranges
> (and
> > thus require the implementation keep around some state) was so that RTT
> > measurements could work despite retransmits. Is that right? But if the
> > happy path doesn't ACK most records in the first place, you won't
> actually
> > get an estimate out of it.
>
> Furthermore, RTT estimation does not seem that useful. There are no
> RTT estimates available when one would want those the most.
>
>
>
>
> -Ilari
>
> _______________________________________________
> TLS mailing list -- tls@ietf.org
> To unsubscribe send an email to tls-le...@ietf.org
>
_______________________________________________
TLS mailing list -- tls@ietf.org
To unsubscribe send an email to tls-le...@ietf.org

Reply via email to