On Wed, Jul 26, 2017 at 3:02 PM, Neal Cardwell <ncardw...@google.com> wrote: > On Wed, Jul 26, 2017 at 2:38 PM, Neal Cardwell <ncardw...@google.com> wrote: >> Yeah, it looks like I can reproduce this issue with (1) bad sacks >> causing repeated TLPs, and (2) TLPs timers being pushed out to later >> times due to incoming data. Scripts are attached. > > I'm testing a fix of only scheduling a TLP if (flag & FLAG_DATA_ACKED) > is true...
An update for the TLP aspect of this thread: our team has a proposed fix for this RTO/TLP reschedule issue that we have reviewed internally and tested with our packetdrill test suite, including some new tests. The basic approach in the fix is as follows: a) only reschedule the xmit timer once per ACK b) only reschedule the xmit timer if tcp_clean_rtx_queue() deems this is safe (a packet was cumulatively ACKed, or we got a SACK for a packet that was sent before the most recent retransmit of the write queue head). After further review and testing we will post it. Hopefully next week. thanks, neal