Hi Yuchung,

This test scenario is only one example to trigger this bug. In generally, as long as cwnd <4, the undo function has this bug.

This would not be a problem for a normal network. But might be an issue, if the network is highly congested (e.g., many many TCP flows with small cwnd <4). In this case, the bug may possibly mistakenly double the sending rate of each flow, and make a highly congested network even more congested .... similar to congestion collapse. This is actually why we need the congestion control algorithms in the first place.

Thanks
Lisong

On 7/21/2017 12:59 PM, Yuchung Cheng wrote:
On Thu, Jul 20, 2017 at 2:28 PM, Wei Sun <unlcsew...@gmail.com> wrote:
Hi Yuchung,

Sorry for the confusion.  The test case was adapted from an old DSACK
test case (i.e., forget to remove something).

Attached is a new and simple one. Thanks
Note that the test scenario is fairly rare IMO: the connection first
experience timeouts, then its retransmission got acked, then the
original packets get Acked (ack w/ val 1400 ecr 130). It can be really
long reordering, or reordering plus packet loss.

The Linux undo state machines may not handle this perfectly, but it's
probably not worth extra state for such rare events.



On Wed, Jul 19, 2017 at 2:31 PM, Yuchung Cheng <ych...@google.com> wrote:
On Tue, Jul 18, 2017 at 2:36 PM, Wei Sun <unlcsew...@gmail.com> wrote:
Hi there,

We find a buggy behavior when using Linux TCP Reno and HTCP in low
bandwidth or highly congested network environments.

In a simple word, their undo functions may mistakenly double the cwnd,
leading to a more aggressive behavior in a highly congested scenario.


The detailed reason:

The current reno undo function assumes cwnd halving (and thus doubles
the cwnd), but it doesn't consider a corner case condition that
ssthresh is at least 2.

e.g.,
                          cwnd              ssth
An initial state:     2                    5
A spurious loss:   1                    2
Undo:                   4                    5

Here the cwnd after undo is two times as that before undo. Attached is
a simple script to reproduce it.
the packetdrill script is a bit confusing: it disables SACK but then
the client returns ACK w/ SACKs, also 3 dupacks happen after RTO so
the sender isn't technically going through a fast recovery...

could you provide a better test?

A similar reason for HTCP, so we recommend to store the cwnd on loss
in .ssthresh implementation and restore it again in .undo_cwnd for TCP
Reno and HTCP implementations.

Thanks

Reply via email to