On Thursday, January 24, 2013 3:03:31 am Andre Oppermann wrote: > On 24.01.2013 03:31, Sepherosa Ziehau wrote: > > On Thu, Jan 24, 2013 at 12:15 AM, John Baldwin <j...@freebsd.org> wrote: > >> On Wednesday, January 23, 2013 1:33:27 am Sepherosa Ziehau wrote: > >>> On Wed, Jan 23, 2013 at 4:11 AM, John Baldwin <j...@freebsd.org> wrote: > >>>> As I mentioned in an earlier thread, I recently had to debug an issue we > >>>> were > >>>> seeing across a link with a high bandwidth-delay product (both high > >>>> bandwidth > >>>> and high RTT). Our specific use case was to use a TCP connection to > >>>> reliably > >>>> forward a latency-sensitive datagram stream across a WAN connection. We > >>>> would > >>>> often see spikes in the latency of individual datagrams. I eventually > >>>> tracked > >>>> this down to the connection entering slow start when it would transmit > >>>> data > >>>> after being idle. The data stream was quite bursty and would often > >>>> attempt to > >>>> transmit a burst of data after being idle for far longer than a > >>>> retransmit > >>>> timeout. > >>>> > >>>> In 7.x we had worked around this in the past by disabling RFC 3390 and > >>>> jacking > >>>> the slow start window size up via a sysctl. On 8.x this no longer > >>>> worked. > >>>> The solution I came up with was to add a new socket option to disable > >>>> idle > >>>> handling completely. That is, when an idle connection restarts with > >>>> this new > >>>> option enabled, it keeps its current congestion window and doesn't enter > >>>> slow > >>>> start. > >>>> > >>>> There are only a few cases where such an option is useful, but if anyone > >>>> else > >>>> thinks this might be useful I'd be happy to add the option to FreeBSD. > >>> > >>> I think what you need is the RFC2861, however, you probably should > >>> ignore the "application-limited period" part of RFC2861. > >> > >> Hummm. It appears btw, that Linux uses RFC 2861, but has a global knob to > >> disable it due to applictions having problems. When it is disabled, > >> it doesn't decay the congestion window at all during idle handling. That > >> is, > >> it appears to act the same as if TCP_IGNOREIDLE were enabled. > >> > >> From http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.html: > >> > >> tcp_slow_start_after_idle (Boolean; default: enabled; since Linux > >> 2.6.18) > >> If enabled, provide RFC 2861 behavior and time out the > >> congestion > >> window after an idle period. An idle period is defined as > >> the current > >> RTO (retransmission timeout). If disabled, the congestion > >> window will > >> not be timed out after an idle period. > >> > >> Also, in this thread on tcp-m it appears no one on that list realizes that > >> there are any implementations which follow the "SHOULD" in RFC 2581 for > >> idle > >> handling (which is what we do currently): > > > > Nah, I don't think the idle detection in FreeBSD follows the > > RFC2581/RFC5681 4.1 (the paragraph before the "SHOULD"). IMHO, that's > > probably why the author in the following email requestioned about the > > implementation of "SHOULD" in RFC2581/RFC5681. > > > >> > >> http://www.ietf.org/mail-archive/web/tcpm/current/msg02864.html > >> > >> So if we were to implement RFC 2861, the new socket option would be > >> equivalent > >> to setting Linux's 'tcp_slow_start_after_idle' to false, but on a > >> per-socket > >> basis rather than globally. > > > > Agree, per-socket option could be useful than global sysctls under > > certain situation. However, in addition to the per-socket option, > > could global sysctl nodes to disable idle_restart/idle_cwv help too? > > No. This is far too dangerous once it makes it into some tuning guide. > The threat of congestion breakdown is real. The Internet, or any packet > network, can only survive in the long term if almost all follow the rules > and self-constrain to remain fair to the others. What would happen if > nobody would respect the traffic lights anymore?
The problem with this argument is Linux has already had this as a tunable option for years and the Internet hasn't melted as a result. > Besides that bursting into unknown network conditions is very likely to > result in burst losses as well. TCP isn't good at recovering from it. > In the end you most likely come out ahead if you decay the restartCWND. > > We have two cases primarily: a) long distance, medium to high RTT, and > wildly varying bandwidth (a.k.a. the Internet); b) short distance, low > RTT and mostly plenty of bandwidth (a.k.a. Datacenter). The former > absolutely definately requires a decayed restartCWND. The latter less > so but even there bursting at 10Gig TSO assisted wirespeed isn't going > to end too happy more often than not. You forgot my case: c) dedicated long distance links with high bandwidth. > Since this seems to be a burning issue I'll come up with a patch in the > next days to add a decaying restartCWND that'll be fair and allow a very > quick ramp up if no loss occurs. I think this could be useful. OTOH, I still think the TCP_IGNOREIDLE option is useful both with and without a decaying restartCWND? -- John Baldwin _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"