On Fri, Jun 29, 2018 at 9:48 PM Lawrence Brakmo <bra...@fb.com> wrote: > > We observed high 99 and 99.9% latencies when doing RPCs with DCTCP. The > problem is triggered when the last packet of a request arrives CE > marked. The reply will carry the ECE mark causing TCP to shrink its cwnd > to 1 (because there are no packets in flight). When the 1st packet of > the next request arrives it was sometimes delayed adding up to 40ms to > the latency. > > This patch insures that CWR makred data packets arriving will be acked > immediately. > > Signed-off-by: Lawrence Brakmo <bra...@fb.com> > ---
Thanks, Larry. Ensuring that CWR-marked data packets arriving will be acked immediately sounds like a good goal to me. I am wondering if we can have a simpler fix. The dctcp_ce_state_0_to_1() code is setting the TCP_ECN_DEMAND_CWR bit in ecn_flags, which disables the code in __tcp_ecn_check_ce() that would have otherwise used the tcp_enter_quickack_mode() mechanism to force an ACK: __tcp_ecn_check_ce(): ... case INET_ECN_CE: if (tcp_ca_needs_ecn(sk)) tcp_ca_event(sk, CA_EVENT_ECN_IS_CE); // -> dctcp_ce_state_0_to_1() // -> tp->ecn_flags |= TCP_ECN_DEMAND_CWR; if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) { /* Better not delay acks, sender can have a very low cwnd */ tcp_enter_quickack_mode(sk, 1); tp->ecn_flags |= TCP_ECN_DEMAND_CWR; } tp->ecn_flags |= TCP_ECN_SEEN; break; So it seems like the bug here may be that dctcp_ce_state_0_to_1() is setting the TCP_ECN_DEMAND_CWR bit in ecn_flags, when really it should let its caller, __tcp_ecn_check_ce() set TCP_ECN_DEMAND_CWR, in which case the code would already properly force an immediate ACK. So, what if we use this fix instead (not compiled, not tested): diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index 5f5e5936760e..4fecd2824edb 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -152,8 +152,6 @@ static void dctcp_ce_state_0_to_1(struct sock *sk) ca->prior_rcv_nxt = tp->rcv_nxt; ca->ce_state = 1; - - tp->ecn_flags |= TCP_ECN_DEMAND_CWR; } static void dctcp_ce_state_1_to_0(struct sock *sk) What do you think? neal