On Wed, Jul 26, 2017 at 1:06 PM, Neal Cardwell <ncardw...@google.com> wrote: > On Wed, Jul 26, 2017 at 12:43 PM, Neal Cardwell <ncardw...@google.com> wrote: >> (2) It looks like there is a bug in the sender code where it seems to >> be repeatedly using a TLP timer firing 211ms after every ACK is >> received to transmit another TLP probe (a new packet in this case). >> Somehow these weird invalid SACKs seem to be triggering a code path >> that makes us think we can send another TLP, when we probably should >> be firing an RTO. That's my interpretation, anyway. I will try to >> reproduce this with packetdrill. > > Hmm. It looks like this might be a general issue, where any time we > get an ACK that doesn't ACK/SACK anything new (whether because it's > incoming data in a bi-directional flow, or a middlebox breaking the > SACKs), then we schedule a TLP timer further out in time. Probably we > should only push the TLP timer out if something is cumulatively ACKed. > > But that's not a trivial thing to do, because by the time we are > deciding whether to schedule another TLP, we have already canceled the > previous TLP and reinstalled an RTO. Hmm.
Yeah, it looks like I can reproduce this issue with (1) bad sacks causing repeated TLPs, and (2) TLPs timers being pushed out to later times due to incoming data. Scripts are attached. neal
tlp-bad-sacks.pkt
Description: Binary data
tlp-bidirectional.pkt
Description: Binary data