On Fri, 29 Jun 2018, Neal Cardwell wrote: > On Fri, Jun 29, 2018 at 6:07 AM Ilpo Järvinen <ilpo.jarvi...@helsinki.fi> > wrote: > > > > If SACK is not enabled and the first cumulative ACK after the RTO > > retransmission covers more than the retransmitted skb, a spurious > > FRTO undo will trigger (assuming FRTO is enabled for that RTO). > > The reason is that any non-retransmitted segment acknowledged will > > set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is > > no indication that it would have been delivered for real (the > > scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK > > case so the check for that bit won't help like it does with SACK). > > Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo > > in tcp_process_loss. > > > > We need to use more strict condition for non-SACK case and check > > that none of the cumulatively ACKed segments were retransmitted > > to prove that progress is due to original transmissions. Only then > > keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in > > non-SACK case. > > > > (FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS > > to better indicate its purpose but to keep this change minimal, it > > will be done in another patch). > > > > Besides burstiness and congestion control violations, this problem > > can result in RTO loop: When the loss recovery is prematurely > > undoed, only new data will be transmitted (if available) and > > the next retransmission can occur only after a new RTO which in case > > of multiple losses (that are not for consecutive packets) requires > > one RTO per loss to recover. > > > > Signed-off-by: Ilpo Järvinen <ilpo.jarvi...@helsinki.fi> > > --- > > net/ipv4/tcp_input.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > index 045d930..8e5522c 100644 > > --- a/net/ipv4/tcp_input.c > > +++ b/net/ipv4/tcp_input.c > > @@ -3181,6 +3181,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, u32 > > prior_fack, > > > > if (tcp_is_reno(tp)) { > > tcp_remove_reno_sacks(sk, pkts_acked); > > + > > + /* If any of the cumulatively ACKed segments was > > + * retransmitted, non-SACK case cannot confirm that > > + * progress was due to original transmission due to > > + * lack of TCPCB_SACKED_ACKED bits even if some of > > + * the packets may have been never retransmitted. > > + */ > > + if (flag & FLAG_RETRANS_DATA_ACKED) > > + flag &= ~FLAG_ORIG_SACK_ACKED; > > } else { > > int delta; > > Thanks, Ilpo. I ran this through our TCP packetdrill tests and only > got one failure, which detected the change you made, so that on the > first ACK in a non-SACK F-RTO case we stay in CA_Loss.
Great, thanks for testing. > However, depending on the exact scenario we are concerned about here, > this may not be a complete solution. Consider the packetdrill test I > cooked below, which is for a scenario where we imagine a non-SACK > connection, with data packet #1 and all dupacks lost. With your patch > we don't have an F-RTO undo on the first cumulative ACK, which is a > definite improvement. However, we end up with an F-RTO undo on the > *second* cumulative ACK, because the second ACK didn't cover any > retransmitted data. That means that we end up in the second round trip > back in the initial slow-start mode with a very high cwnd and infinite > ssthresh, even though data was actually lost in the first round trip. > > I'm not sure how much we care about all these cases. Perhaps we should > just disable F-RTO if the connection has no SACK enabled? These > non-SACK connections are just such a rare case at this point that I > would propose it's not worth spending too much time on this. > > The following packetdrill script passes when Ilpo's patch is applied. > This documents the behavior, which I think is questionable: > > 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > +0 bind(3, ..., ...) = 0 > +0 listen(3, 1) = 0 > > +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7> > +0 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 8> > +.02 < . 1:1(0) ack 1 win 257 > +0 accept(3, ..., ...) = 4 > +0 write(4, ..., 27000) = 27000 > +0 > . 1:10001(10000) ack 1 > > // Suppose 1:1001 is lost and all dupacks are lost. > > // RTO and retransmit head. This fills a real hole. > +.22 > . 1:1001(1000) ack 1 > > +.005 < . 1:1(0) ack 2001 win 257 Why did the receiver send a cumulative ACK only for 2001? > +0 > . 10001:13001(3000) ack 1 > +0 %{ assert tcpi_ca_state == TCP_CA_Loss, tcpi_ca_state }% > +0 %{ assert tcpi_snd_cwnd == 3, tcpi_snd_cwnd }% > +0 %{ assert tcpi_snd_ssthresh == 7, tcpi_snd_ssthresh }% > > +.002 < . 1:1(0) ack 10001 win 257 ...And then magically all packets were received up to 10001 here as we get cumulative ACK that far? None were lost I guess? Were they perhaps reordered past RTO? > +0 %{ assert tcpi_ca_state == TCP_CA_Open, tcpi_ca_state }% > +0 %{ assert tcpi_snd_cwnd == 18, tcpi_snd_cwnd }% > +0 %{ assert tcpi_snd_ssthresh == 2147483647, tcpi_snd_ssthresh }% > +0 > . 13001:15001(2000) ack 1 This looks to me just another "lets defeat the heuristics" pattern rather than what a realistic receiver / properly functioning "middlebox" should do, or do I miss something? I think that the power/ability with packetdrill sets a trap for us we need to be careful not to step into: It is very possible to overdo scenarios with packetdrill to a point where they stop being realistic. When we go to this road, we might also look into FRTO with SACK with >1 hole and all ACKs lost. I think it will trigger exactly the same issue my patch attempts to fix for non-SACK cases because scoreboard won't get S-bits from those non-arriving ACKs (it could also trigger the scenario you describe). ...I guess it could be fixed by limiting the undo check really into step 3 rather than possibly invoking it already in step 2, which is a subtle difference with the current Linux implementation and FRTO RFC. But I want to state (now that I bring this scenario up), that I think this is much much less likely to trigger than the issue my fix addresses (very possible to be true even considering the huge bias in # of SACK enabled vs non-SACK flow). ...Beyond that one, I'm yet to see/come up a convincing scenario due to FRTO heuristics. -- i.