John: A burst at line rate will *often* cause drops. This is because router queues are at a finite size. Also such a burst (especially on a long delay bandwidth network) cause your RTT to increase even if there is no drop which is going to hurt you as well.
A SHOULD in an RFC says you really really really really need to do it unless there is some thing that makes you willing to override it. It is slight wiggle room. In this I agree with Andre, we should not be *not* doing it. Otherwise folks will be turning this on and it is plain wrong. It may be fine for your network but I would not want to see it in FreeBSD. In my testing here at home I have put back into our stack max-burst. This uses Mark Allman's version (not Kacheong Poon's) where you clamp the cwnd at no more than 4 packets larger than your flight. All of my testing high-bw-delay or lan has shown this to improve TCP performance. This is because it helps you avoid bursting out so many packets that you overflow a queue. In your long-delay bw link if you do burst out too many (and you never know how many that is since you can not predict how full all those MPLS queues are or how big they are) you will really hurt yourself even worse. Note that generally in Cisco routers the default queue size is somewhere between 100-300 packets depending on the router. bottom line IMO this is a bad idea. If you want to really improve that link, let me get with you off line and we can see about getting you a couple of our boxes again :-D. R On Jan 22, 2013, at 4:37 PM, Andre Oppermann wrote: > On 22.01.2013 21:35, Alfred Perlstein wrote: >> On 1/22/13 12:11 PM, John Baldwin wrote: >>> As I mentioned in an earlier thread, I recently had to debug an issue we >>> were >>> seeing across a link with a high bandwidth-delay product (both high >>> bandwidth >>> and high RTT). Our specific use case was to use a TCP connection to >>> reliably >>> forward a latency-sensitive datagram stream across a WAN connection. We >>> would >>> often see spikes in the latency of individual datagrams. I eventually >>> tracked >>> this down to the connection entering slow start when it would transmit data >>> after being idle. The data stream was quite bursty and would often attempt >>> to >>> transmit a burst of data after being idle for far longer than a retransmit >>> timeout. >>> >>> In 7.x we had worked around this in the past by disabling RFC 3390 and >>> jacking >>> the slow start window size up via a sysctl. On 8.x this no longer worked. >>> The solution I came up with was to add a new socket option to disable idle >>> handling completely. That is, when an idle connection restarts with this >>> new >>> option enabled, it keeps its current congestion window and doesn't enter >>> slow >>> start. >>> >>> There are only a few cases where such an option is useful, but if anyone >>> else >>> thinks this might be useful I'd be happy to add the option to FreeBSD. >> >> This looks good, but it almost sounds like a bug for TCP to be doing this >> anyhow. > > It's not a bug. It's by design. It's required by the RFC. > >> Why would one want this behavior? > > Network conditions change all the time. Traffic and congestion comes and > goes. > Connections can go idle for milliseconds to minutes to hours. Whenever > "enough" > time has passed network capacity probing has to start anew. > >> Wouldn't it make sense to keep the window large until there was a problem >> rather than >> unconditionally chop it down? I almost think TCP is afraid that you might >> wind up swapping out a >> 10gig interface for a modem? I'm just not getting it. (probably simple >> oversight on my part). > > The very real fear is congestion meltdown. That is the reason we ended up > with > TCP's AIMD mechanism in the first place. If everybody were to blast into the > network anyone will suffer. The bufferbloat issue identified recently makes > things > even worse. > >> What do you think about also making this a sysctl for global on/off by >> default? > > Please don't. The correct fix is either a) to use the initial window as the > restart > window (up to 10 MSS nowadays); b) to use a decay mechanism based on the time > since > the last network condition probe. Even the latter must decay to initCWND > within at > most 1MSL. > > -- > Andre > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > ------------------------------ Randall Stewart 803-317-4952 (cell) _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"