On Wed, Jun 27, 2018 at 8:24 AM, Neal Cardwell <ncardw...@google.com> wrote: > On Tue, Jun 26, 2018 at 10:34 PM Lawrence Brakmo <bra...@fb.com> wrote: >> The only issue is if it is safe to always use 2 or if it is better to >> use min(2, snd_ssthresh) (which could still trigger the problem). > > Always using 2 SGTM. I don't think we need min(2, snd_ssthresh), as > that should be the same as just 2, since: > > (a) RFCs mandate ssthresh should not be below 2, e.g. > https://tools.ietf.org/html/rfc5681 page 7: > > ssthresh = max (FlightSize / 2, 2*SMSS) (4) > > (b) The main loss-based CCs used in Linux (CUBIC, Reno, DCTCP) respect > that constraint, and always have an ssthresh of at least 2. > > And if some CC misbehaves and uses a lower ssthresh, then taking > min(2, snd_ssthresh) will trigger problems, as you note. > >> + tp->snd_cwnd = max((int)tcp_packets_in_flight(tp) + sndcnt, 2); > > AFAICT this does seem like it will make the sender behavior more > aggressive in cases with high loss and/or a very low per-flow > fair-share. > > Old: > > o send N packets > o receive SACKs for last 3 packets > o fast retransmit packet 1 > o using ACKs, slow-start upward > > New: > > o send N packets > o receive SACKs for last 3 packets > o fast retransmit packets 1 and 2 > o using ACKs, slow-start upward > > In the extreme case, if the available fair share is less than 2 > packets, whereas inflight would have oscillated between 1 packet and 2 > packets with the existing code, it now seems like with this commit the > inflight will now hover at 2. It seems like this would have > significantly higher losses than we had with the existing code. I share similar concern. Note that this function is used by most existing congestion control modules beside DCTCP so I am more cautious of changing this to address DCTCP issue.
One problem that DCTCP paper notices when cwnd = 1 is still too big when the bottleneck is shared by many flows (e.g. incast). It specifically suggest changing the lower-bound of 2 in the spec to 1. (Section 8.2). https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-judd.pdf I am curious about the differences you observe in 4.11 and 4.16. I wasn't aware of any (significant) change in tcp_cwnd_reduction / PRR algorithm between 4.11 and 4.16. Also the receiver should not delay ACKs if it has out-of-order packet or receiving CE data packets. This means the delayed ACK is by tail losses and the last data received carries no CE mark: seems a less common scenario? If delayed-ACK is the problem, we probably should fix the receiver to delay ACK more intelligently, not the sender. wei...@google.com is working on it. > > This may or may not be OK in practice, but IMHO it is worth mentioning > and discussing. > > neal