On Fri, Mar 26, 2021 at 3:08 PM Eric Rescorla <e...@rtfm.com> wrote: > Hi folks, > > This is a combined response to Martin Duke and to Mark Allman. > > Before I respond in detail I'd like to level set a bit. > > First, DTLS does not provide a generic reliable bulk data transmission > capability. Rather, it provides an unreliable channel (a la UDP). > That channel is set up with a handshake protocol and DTLS provides > relibaility for that protocol. However, that protocol is run > infrequently and generally involves relatively small amounts > (typically << 10KB) of data being sent. This means that we have rather > more latitude in terms of how aggressively we retransmit because > it only applies to a small fraction of the traffic. > > Second, DTLS 1.2 is already widely deployed. It uses a simple "wait > for the timer to expire and retransmit everything" approach, with the > timer being doubled on each retransmission. This doesn't always > provide ideal results, but also has not caused the network to > collapse. I don't know much about how things are deployed in the IoT > setting (paging Hannes Tschofenig) but at least in the WebRTC context, > we have found the 1000ms guidance to be unduly long (as a practical > matter, video conferencing just won't work with delays over > 100-200ms). Firefox uses 50ms and AIUI Chrome uses a value derived > from the ICE handshake (which is probably better because there > are certainly times where 50ms is too short). > > > > Martin Duke's Comments: > > > In Sec 5.8.2, it is a significant change from DTLS 1.2 that the > > initial timeout is dropping from 1 sec to 100ms, and this is worthy of > > some discussion. This violation of RFC8961 ought to be explored > > further. For a client first flight of one packet, it seems > > unobjectionable. However, I'm less comfortable with a potentially > > large server first flight, or a client second flight, likely leading > > to a large spurious retransmission. With large flights, not only is a > > short timeout more dangerous, but you are more likely to get an ACK in > > the event of some loss that allows you to shortcut the timer anyway > > (i.e. the cost of long timeout is smaller) > > You seem to be implicitly assuming that there is individual packet > loss rather than burst loss. If the entire flight is lost, you want to > just fall back to retransmitting. > > > > Relatedly, in section 5.8.3 there is no specific recommendation for a > > maximum flight size at all. I would think that applications SHOULD > > have no more than 10 datagrams outstanding unless it has some OOB > > evidence of available bandwidth on the channel, in keeping with de > > facto transport best practice. > > I agree that this is a reasonable change. > > > > Finally, I am somewhat concerned that the lack of any window reduction > > might perform poorly in constrained environments. > > I'm skeptical that this is actually the case. As a practical matter, > TLS flights rarely exceed 5 packets. For instance, Fastly's data on > QUIC [0] indicates that the server's first flight (the biggest flight > in the TLS 1.3 handshake) is less than 5 packets for the vast majority > of handshakes, even without certificate compression. Given that > constrained environments have more incentive to reduce bandwidth, I > would expect them to typically be smaller, either via using smaller > certificates or using some of the existing techniques for reducing > handshake size such as cert compression or cached info. > > > > > Granted, doubling > > the timeout will reduce the rate, but when retransmission is > > ack-driven there is essentially no reduction of sending rate in > > response to loss. > > I don't believe this is correct. Recall that unlike TCP, there's > generally no buffer of queued packets waiting to be transmitted. > Rather, there is a fixed flight of data which must be delivered. With > one exceptional case [1], an ACK will reflect that some but not all of > the data was delivered and processed; when retransmitting, the > sender will only retransmit the un-ACKed packets, which naturally > reduces the sending rate. Given the quite small flights in play > here, that reduction is likely to be quite substantial. For instance, > if there are three packets and 1 is ACKed, then there will > be a reduction of 1/3. > > > > I want to emphasize that I am not looking to fully recreate TCP here; > > some bounds on this behavior would likely be satisfactory. > > > > Here is an example of something that I think would be workable. It is > > meant to be a starting point for discussion. I've asked for some input > > from the experts in this area who may feel differently. > > > > - In general, the initial timeout is 100ms. > > - The timeout backoff is not reset after successful delivery. > > This > > allows the "discovery" in bullet 1 to be safely applied to larger > > flights. > > Note that the timeout is actually only reset after successful loss-free > delivery of a flight: > > Implementations SHOULD retain the current timer value until a > message is transmitted and acknowledged without having to > be retransmitted, at which time the value may be > reset to the initial value. > > There seems to be some confusion here (perhaps due to bad writing). > When the text says "resets the retransmission timer" it means "re-arm > it with the current value" not "re-set it to the initial default". For > instance, suppose that I send flight 1 with retransmit timer value > T. After T seconds, I have not received anything and so I retransmit > it, doubling to 2T. After I get a response, I now send a new > flight. The timer should be 2T, not T. > > With that said, I think it would be reasonable to re-set to whatever > the measured RTT was, rather than the initial default. This would > avoid potentially resetting to an overly low default (though it's > not clear to me how this could happen because if your RTT estimate > is too low you will never get a delivery without retransmission). >
NM on this piece. I see how that happens and I think it's fine to reset to "measured RTT" whatever that is. As I said, in practice this situation is very rare with DTLS because there are so few handshake flights. -Ekr > > > - For a first flight of > 2 packets, the sender MUST either (a) set > > the initial timeout to 1 second OR (b) retransmit no more than 2 > > packets after timeout. > > - flights SHOULD be limited to 10 packets > > - on timeout or ack-indicated retransmission, no more than half > > (minimum one) of the flight should be retransmitted > > > > The theory here is that it's responsive to RTTs > 100ms, but small > > flights can be more aggressive, and large flows are likely to have > > ack-driven retransmission. > > I think it would be useful to distinguish two sets of concerns here: > > 1. That timeout-driven retransmission is too aggressive due to > too-short timers. > > 2. That ACK-driven retransmission will be too aggressive (presumably > due to the ACK indicating congestion-driven loss; if the loss > is due to burst errors, then we want to retransmit aggressively). > > On point (1), I think that the fact that we have extensive deployment > of timeout-driven retransmission in the field with short timers is > fairly strong evidence that it will not destroy the Internet and more > generally that the "retransmit the whole flight" design is safe in > this case. I certainly agree that there might be settings in which > 100ms is too short. Rather than litigate the timer value, which I > agree is a judgement call, I suggest we increase the default somewhat > (250? 500) and then indicate that if the application has information > that a shorter timer is appropriate, it can use one. > > As far as point (2) goes, I don't think that any change is indicated > here. As I indicated above, there is a finite amount of data to > transmit and the design of the ACKs is such that you will continue to > make forward progress (and if you're not, you won't be getting > ACKs). Given the small fraction of the network traffic that will be > DTLS handshakes, the primary risk here seems to be that on a very > constrained network, you will get suboptimal performance for your > handshake, but even that should resolve in a small number of round > trips, especially if the receiver buffers out of order packets (which > you obviously want to do in a constrained network). And if you do have > random loss rather than congestion loss, backing off will have a very > negative impact on the handshake for minimal reduction in packets > transmitted [2]. > > With that said, given that your concern seems to be large flights, > I could maybe live with halving the *window* rather than the > size of the flight. In your example, you suggest an initial window > of 10, so this would give us 10, 5, 3, ... This would have little > practical impact on the vast majority of handshakes, but I suppose > might slightly improve things on the edge cases where you have > a large flight *and* a high congestion network. > > > Mark Allman's comments: > > > A few specific things (in addition to what Gorry said, which I > > absolutely agree with): > > > > - "Though timer values are the choice of the implementation, > > mishandling of the timer can lead to serious congestion > > problems" > > > > + Gorry flagged this and I am flagging it again. If this is > > something that can lead to serious problems, let's not just > > leave it to "choice of the implementation". Especially if we > > have some idea how to make it less problematic. > > I'm not sure what you'd like here. I think the guidance in this > specification is reasonable, so I'd be happy to just remove this > text. > > > > > - "Implementations SHOULD use an initial timer value of 100 msec > > (the minimum defined in RFC 6298 [RFC6298])" > > > > + I wrote RFC 6298 and I have no idea where this is coming from! > > > > + Even if this value of 100msec is OK for DTLS it shouldn't lean > > on RFC 6298 because RFC 6298 doesn't say that is OK. I.e., > > the parenthetical is objectively wrong. > > > > + RFC 6298 says the INITIAL RTO should be 1sec (point (2.1) in > > section 2). RFC 8961 affirms this and also says the INITIAL > > RTO should be 1sec (requirement (1) in section 4). > > Yeah, I'm not sure what happened here. I could go track down the > PRs but I'll just plead editorial error. I suggest we just remove > the parenthetical because it's not helping here. > > > > - "Note that a 100 msec timer is recommended rather than the > > 3-second RFC 6298 default in order to improve latency for > > time-sensitive applications." > > > > + Again, this mis-states RFC 6298, which says the initial RTO is > > 1sec (not 3sec). (Previous to RFC 6298 the initial RTO was > > 3sec, which is probably where the notion comes from. Most of > > the purpose of RFC 6298 was to drop the initial RTO to 1sec.) > > My bad. I'll fix this. > > > > + This is a statement of desire, not any sort of principled > > justification for using 100msec. At the least this should be > > much better argued. > > See my note to Martin Duke above. What's appropriate in a very low > volume handshake protocol is different from what's appropriate in a > bulk transport protocol. With that said, as I said to Martin, I don't > think litigating the precise value is that helpful, so I propose we > just increase it to a somewhat larger value and explicitly acknowledge > that specific settings may want to use a shorter value. > > > > > - "The retransmit timer expires: the implementation transitions to > > the SENDING state, where it retransmits the flight, resets the > > retransmit timer, and returns to the WAITING state." > > > > + Maybe this is spec sloppiness, but boy does it sound like the > > recipe TCP used before VJCC to collapse the network. I.e., > > expire and retransmit the window. Rinse and repeat. It may > > be the intention is for backoff to be involved. But, that > > isn't what it says. > > It says it elsewhere, in the section you quoted: > > a congested link. Implementations SHOULD use an initial timer value > of 100 msec (the minimum defined in RFC 6298 {{RFC6298}}) and double > the value at each retransmission, up to no less than 60 seconds > (the RFC 6298 maximum). > > As I said to Martin, I think some of the confusion is that this > specification > uses "reset" to mean both "re-arm" and "set the value back to the initial" > and depends on context to clarify that. Obviously that's not been > entirely successful, so I propose to use re-arm" where I mean "start a > timer with the now current value". > > As noted above, this piece of the retransmission algorithm is already > quite widely deployed (it was in DTLS 1.2) so I think there's a reasonably > strong presumption that it is not horribly dangerous, though concededly > suboptimal (hence the addition of ACKs in this specification), > > > > - “When they have received part of a flight and do not immediately > > receive the rest of the flight (which may be in the same UDP > > datagram). A reasonable approach here is to set a timer for 1/4 the > > current retransmit timer value when the first record in the flight > > is received and then send an ACK when that timer expires.” > > > > + Where does 1/4 come from? Why is it "reasonable"? This just > > feels like a complete WAG that was pulled out of the air. > > Yes, it was in fact pulled out of the air (though I did discuss it > with Ian Swett a bit). To be honest, any value here is going to be > somewhat pulled out of the air, especially because during the > handshake the retransmit timer values are incredibly imprecise, > consisting as they do of (at most) one set of samples. In general, > this value is a compromise between ACKing too aggressively (thus > causing spurious retransmission of in-flight packets) and ACKing too > conservatively (thus causing spurious retransmission of received > packets). > > If you have a different proposal, I'm certainly open to it. FWIW, > QUIC's max_ack_delay is 25ms, and that would certainly be fine with > me. > > -Ekr > > [0] > https://www.fastly.com/blog/quic-handshake-tls-compression-certificates-extension-study > [1] When SH is lost. > [2] In fact, there will be *more* packets transmitted because you now will > have > ACKs for each chunk of the flight, though of course they will be > transmitted > over a longer time scale. > > On Thu, Mar 25, 2021 at 9:51 AM Martin Duke <martin.h.d...@gmail.com> > wrote: > >> Hello all, >> >> The outcome of the telechat was that I agreed to start a thread on how to >> fix the significant transport issues with the DTLS 1.3 draft. If I am >> correct, there was no early TCPM or TSVWG review. A major protocol with >> significant transport-layer functionality would benefit from such review in >> the future. >> >> *Who is in this thread*: >> >> For easy reference, here is my DISCUSS, which goes so far as to express a >> straw man design that would come closer to addressing the concerns: >> https://mailarchive.ietf.org/arch/msg/tls/3g20CQkKWPGX-BAqfuEagR2ppGY/ >> >> Besides TLSWG, I've added Lars (RFC8085 >> <https://datatracker.ietf.org/doc/rfc8085/>), Mark Allman (RFC8961 >> <https://datatracker.ietf.org/doc/rfc8961/>), and Gorry Fairhurst (also >> RFC8085). Mark and Gorry have already sent me private comments that I >> invite them to resend here. To summarize briefly, they amplified my >> DISCUSS, made the new point that 8085 is directly relevant here, and are >> concerned there aren't enough MUSTs >> >> If people think there would be value in advertising this thread to the >> TCPM and TSVWG working groups, I can do so, at the risk of introducing more >> ancillary document churn. >> >> *Suggested plan:* >> >> Anyway, as a first step perhaps we can have Mark, Gorry, and Lars add >> anything they'd like and then invite the draft authors to either make a >> proposal or push back. If there are non-kosher things that DTLS 1.2 has >> done with no observable problems, that would be an interesting data point: >> within limits, introducing a latency regression into DTLS 1.3 would be >> perverse. >> >> DTLS is a very important protocol and it is worth the time to get these >> things right. >> >> Thanks, >> Martin Duke >> Transport AD >> >
_______________________________________________ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls