On Wed, Jun 27, 2018 at 1:00 PM, Lawrence Brakmo <bra...@fb.com> wrote: > > > From: <netdev-ow...@vger.kernel.org> on behalf of Yuchung Cheng > <ych...@google.com> > Date: Wednesday, June 27, 2018 at 9:59 AM > To: Neal Cardwell <ncardw...@google.com> > Cc: Lawrence Brakmo <bra...@fb.com>, Matt Mathis <mattmat...@google.com>, > Netdev <netdev@vger.kernel.org>, Kernel Team <kernel-t...@fb.com>, Blake > Matheny <bmath...@fb.com>, Alexei Starovoitov <a...@fb.com>, Eric Dumazet > <eric.duma...@gmail.com>, Wei Wang <wei...@google.com> > Subject: Re: [PATCH net-next v2] tcp: force cwnd at least 2 in > tcp_cwnd_reduction > > > > On Wed, Jun 27, 2018 at 8:24 AM, Neal Cardwell <ncardw...@google.com> wrote: > > On Tue, Jun 26, 2018 at 10:34 PM Lawrence Brakmo <bra...@fb.com> wrote: > > The only issue is if it is safe to always use 2 or if it is better to > > use min(2, snd_ssthresh) (which could still trigger the problem). > > > > Always using 2 SGTM. I don't think we need min(2, snd_ssthresh), as > > that should be the same as just 2, since: > > > > (a) RFCs mandate ssthresh should not be below 2, e.g. > > https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_rfc5681&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=pq_Mqvzfy-C8ltkgyx1u_g&m=PZQC-6NGqK6QEVrf1WhlAD3mQt7tK8aqrfQGp93dJy4&s=WH9fXpDBHC31NCixiiMvhOb2UXCuJIxUiY4IXyJTgpc&e= > page 7: > > > > ssthresh = max (FlightSize / 2, 2*SMSS) (4) > > > > (b) The main loss-based CCs used in Linux (CUBIC, Reno, DCTCP) respect > > that constraint, and always have an ssthresh of at least 2. > > > > And if some CC misbehaves and uses a lower ssthresh, then taking > > min(2, snd_ssthresh) will trigger problems, as you note. > > > > + tp->snd_cwnd = max((int)tcp_packets_in_flight(tp) + sndcnt, 2); > > > > AFAICT this does seem like it will make the sender behavior more > > aggressive in cases with high loss and/or a very low per-flow > > fair-share. > > > > Old: > > > > o send N packets > > o receive SACKs for last 3 packets > > o fast retransmit packet 1 > > o using ACKs, slow-start upward > > > > New: > > > > o send N packets > > o receive SACKs for last 3 packets > > o fast retransmit packets 1 and 2 > > o using ACKs, slow-start upward > > > > In the extreme case, if the available fair share is less than 2 > > packets, whereas inflight would have oscillated between 1 packet and 2 > > packets with the existing code, it now seems like with this commit the > > inflight will now hover at 2. It seems like this would have > > significantly higher losses than we had with the existing code. > > I share similar concern. Note that this function is used by most > > existing congestion control modules beside DCTCP so I am more cautious > > of changing this to address DCTCP issue. > > > > Theoretically it could happen with any CC, but I have only seen it > consistently with DCTCP. > > One option, as I mentioned to Neal, is to use 2 only when cwnd reduction is > caused by > > ECN signal. > > > > One problem that DCTCP paper notices when cwnd = 1 is still too big > > when the bottleneck > > is shared by many flows (e.g. incast). It specifically suggest > > changing the lower-bound of 2 in the spec to 1. (Section 8.2). > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.usenix.org_system_files_conference_nsdi15_nsdi15-2Dpaper-2Djudd.pdf&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=pq_Mqvzfy-C8ltkgyx1u_g&m=PZQC-6NGqK6QEVrf1WhlAD3mQt7tK8aqrfQGp93dJy4&s=fO8p5khks-sGO_lwEvKicWtFz_q9LI-ecwZcS1nyZ7w&e= > > > > One could even envision using even lower values than 1, where we send one > packet > > every 2 RTTs. However, I do not think this is a problem that needs to be > fixed. > > > > I am curious about the differences you observe in 4.11 and 4.16. I > > wasn't aware of any (significant) change in tcp_cwnd_reduction / PRR > > algorithm between 4.11 and 4.16. Also the receiver should not delay > > ACKs if it has out-of-order packet or receiving CE data packets. This > > means the delayed ACK is by tail losses and the last data received > > carries no CE mark: seems a less common scenario? > > > > I have the feeling the differences between 4.11 and 4.16 are due to > accounting > > changes and not change in CC behaviors. If the difference is not due to change in PRR, I don't think we should change PRR to address that until we have some evidence.
Could you share some traces? > > > > The packet and ACK are not lost. I captured pcaps on both hosts and could > see > > cases when the data packet arrived, but the ACK was not sent before the > packet was > > retransmitted. I saw this behavior multiple times. In many cases the ACK did > not show > > up in the pcap within the 40ms one would usually expect with a delayed ACK. > > If delayed-ACK is the problem, we probably should fix the receiver to > > delay ACK more intelligently, not the sender. wei...@google.com is > > working on it. > > > > Not sure if it is delayed-ACK or something caused the ACK to not be sent > since it did > > not show up within 40ms as one would usually expect for a delayed ACK. > > > > > > > > This may or may not be OK in practice, but IMHO it is worth mentioning > > and discussing. > > > > neal > >