On Fri, Jul 28, 2017 at 6:54 PM, Neal Cardwell <ncardw...@google.com> wrote: > On Wed, Jul 26, 2017 at 3:02 PM, Neal Cardwell <ncardw...@google.com> wrote: >> On Wed, Jul 26, 2017 at 2:38 PM, Neal Cardwell <ncardw...@google.com> wrote: >>> Yeah, it looks like I can reproduce this issue with (1) bad sacks >>> causing repeated TLPs, and (2) TLPs timers being pushed out to later >>> times due to incoming data. Scripts are attached. >> >> I'm testing a fix of only scheduling a TLP if (flag & FLAG_DATA_ACKED) >> is true... > > An update for the TLP aspect of this thread: our team has a proposed > fix for this RTO/TLP reschedule issue that we have reviewed internally > and tested with our packetdrill test suite, including some new tests. > The basic approach in the fix is as follows: > > a) only reschedule the xmit timer once per ACK > > b) only reschedule the xmit timer if tcp_clean_rtx_queue() deems this > is safe (a packet was cumulatively ACKed, or we got a SACK for a > packet that was sent before the most recent retransmit of the write > queue head). > > After further review and testing we will post it. Hopefully next week.
The timer patches are upstream for review for the "net" branch: https://patchwork.ozlabs.org/patch/796057/ https://patchwork.ozlabs.org/patch/796058/ https://patchwork.ozlabs.org/patch/796059/ Again, thank you for reporting this, and thanks for the packet trace! neal