Re: TCP congestion control and large router buffers

Jim Gettys Wed, 22 Dec 2010 08:49:39 -0800

On 12/21/2010 04:24 PM, Fred Baker wrote:


On Dec 20, 2010, at 11:18 PM, Mikael Abrahamsson wrote:

On Mon, 20 Dec 2010, Jim Gettys wrote:

Common knowledge among whom?  I'm hardly a naive Internet user.


Anyone actually looking into the matter. The Cisco "fair-queue" command was 
introduced in IOS 11.0 according 
to<http://www.cisco.com/en/US/docs/ios/12_2/qos/command/reference/qrfcmd1.html#wp1098249> 
 to somewhat handle the problem. I have no idea when this was in time, but I guess early 
90:ties?


1995. I know the guy that wrote the code. Meet me in a bar and we can share war 
stories. The technology actually helps with problems like RFC 6057 addresses 
pretty effectively.

is a good idea, you aren't old enough to have experienced the NSFnet collapse 
during the 1980's (as I did).  I have post-traumatic stress disorder from that 
experience; I'm worried about the confluence of these changes, folks.


I'm happy you were there, I was under the impression that routers had large 
buffers back then as well?


Not really. Yup, several of us were there. The common routers on the NSFNET and related 
networks were fuzzballs, which had 8 (count them, 8) 576 byte buffers, Cisco AGS/AGS+, 
and Proteon routers. The Cisco routers of the day generally had 40 buffers on each 
interface by default, and might have had configuration changes; I can't comment on the 
Proteon routers. For a 56 KBPS line, given 1504 bytes per message (1500 bytes IP+data, 
and four bytes of HDLC overhead), that's theoretically 8.5 seconds. But given that 
messages were in fact usually 576 bytes of IP data (cf "fuzzballs" and unix 
behavior for off-LAN communications) and interspersed with TCP control messages (Acks, 
SYNs, FINs, RST), real queue depths were more like two seconds at a bottleneck router. 
The question would be the impact of a sequence of routers all acting as bottlenecks.

IMHO, AQM (RED or whatever) is your friend. The question is what to set 
min-threshold to. Kathy Nichols (Van's wife) did a lot of simulations. I don't 
know that the paper was ever published, but as I recall she wound up 
recommending something like this:

line rate       ms queue depth
   (MBPS)        RED min-threshold
      2         32
     10         16
    155         8
    622         4
  2,500         2
10,000          1

I don't know if you are referring to the "RED in a different light"paper: that was never published, though an early draft escaped and canbe found on the net.

"RED in a different light" identifies two bugs in the RED algorithm, andproposes a better algorithm that only depends on the link outputbandwidth. That draft still has a bug.

The (almost completed) version of the paper that never got published;Van has retrieved it from back up, and I'm trying to pry it out of Van'shands to get it converted to something we can read today (it's inFrameMaker).

In the meanwhile, turn on (W)RED! For routers run by most people onthis list, it's always way better than nothing, even if Van doesn'tthink classic RED will solve the home router bufferbloat problem. (wherewe have 2 orders of magnitude variation of wireless bandwidth along withhighly variable workload). That's not true in the internet core.

But yes, I agree that we'd all be much helped if manufacturers of both ends of 
all links had the common decency of introducing a WRED (with ECN marking) AQM 
that had 0% drop probability at 40ms and 100% drop probability at 200ms (and 
linear increase between).


so, min-threshold=40 ms and max-threshold=200 ms. That's good on low speed 
links; it will actually control queue depths to an average of O(min-threshold) 
at whatever value you set it to. The problem with 40 ms is that it interacts 
poorly with some applications, notably voice and video.

It also doesn't match well to published studies like 
http://www.pittsburgh.intel-research.net/~kpapagia/papers/p2pdelay-analysis.pdf.
 In that study, a min-threshold of 40 ms would have cut in only on six 
a-few-second events in the course of a five hour sample. If 40 ms is on the 
order of magnitude of a typical RTT, it suggests that you could still have 
multiple retransmissions from the same session in the same queue.

A good photo of buffer bloat is at
       ftp://ftpeng.cisco.com/fred/RTT/Pages/4.html
       ftp://ftpeng.cisco.com/fred/RTT/Pages/5.html

The first is a trace I took overnight in a hotel I stayed in. Never mind the 
name of the hotel, it's not important. The second is the delay distribution, 
which is highly unusual - you expect to see delay distributions more like

       ftp://ftpeng.cisco.com/fred/RTT/Pages/8.html

Thanks, Fred! Can I use these in the general bufferbloat talk I'mworking on with attribution? It's a far better example/presentation ina graphic form than I currently have for the internet core case (where Idon't even have anything other than memory of probing the hotel's ISP'snetwork).


(which actually shows two distributions - the blue one is fairly normal, and 
the green one is a link that spends much of the day chock-a-block).

My conjecture re 5.html is that the link *never* drops, and at times has as 
many as nine retransmissions of the same packet in it. The spikes in the graph 
are about a TCP RTO timeout apart. That's a truly worst case. For N-1 of the N 
retransmissions, it's a waste of storage space and a waste of bandwidth.

AQM is your friend. Your buffer should be able to temporarily buffer as much as 
an RTT of traffic, which is to say that it should be large enough to ensure 
that if you get a big burst followed by a silent period you should be able to 
use the entire capacity of the link to ride it out. Your min-threshold should 
be at a value that makes your median queue depth relatively shallow. The 
numbers above are a reasonable guide, but as in all things, YMMV.


Yup. AQM is our friend.

And we need it in many places we hadn't realised we did (like our OS's).
                          - Jim

Re: TCP congestion control and large router buffers

Reply via email to