Re: [PATCH] Add a new TCP_IGNOREIDLE socket option

Andre Oppermann Thu, 24 Jan 2013 00:03:55 -0800

On 24.01.2013 03:31, Sepherosa Ziehau wrote:

On Thu, Jan 24, 2013 at 12:15 AM, John Baldwin <j...@freebsd.org> wrote:

On Wednesday, January 23, 2013 1:33:27 am Sepherosa Ziehau wrote:

On Wed, Jan 23, 2013 at 4:11 AM, John Baldwin <j...@freebsd.org> wrote:

As I mentioned in an earlier thread, I recently had to debug an issue we were
seeing across a link with a high bandwidth-delay product (both high bandwidth
and high RTT).  Our specific use case was to use a TCP connection to reliably
forward a latency-sensitive datagram stream across a WAN connection.  We would
often see spikes in the latency of individual datagrams.  I eventually tracked
this down to the connection entering slow start when it would transmit data
after being idle.  The data stream was quite bursty and would often attempt to
transmit a burst of data after being idle for far longer than a retransmit
timeout.


In 7.x we had worked around this in the past by disabling RFC 3390 and jacking
the slow start window size up via a sysctl.  On 8.x this no longer worked.
The solution I came up with was to add a new socket option to disable idle
handling completely.  That is, when an idle connection restarts with this new
option enabled, it keeps its current congestion window and doesn't enter slow
start.

There are only a few cases where such an option is useful, but if anyone else
thinks this might be useful I'd be happy to add the option to FreeBSD.


I think what you need is the RFC2861, however, you probably should
ignore the "application-limited period" part of RFC2861.


Hummm.  It appears btw, that Linux uses RFC 2861, but has a global knob to
disable it due to applictions having problems.  When it is disabled,
it doesn't decay the congestion window at all during idle handling.  That is,
it appears to act the same as if TCP_IGNOREIDLE were enabled.

 From http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.html:

        tcp_slow_start_after_idle (Boolean; default: enabled; since Linux 
2.6.18)
               If enabled, provide RFC 2861 behavior and time out the congestion
               window after an idle period.  An idle period is defined as the 
current
               RTO (retransmission timeout).  If disabled, the congestion 
window will
               not be timed out after an idle period.

Also, in this thread on tcp-m it appears no one on that list realizes that
there are any implementations which follow the "SHOULD" in RFC 2581 for idle
handling (which is what we do currently):


Nah, I don't think the idle detection in FreeBSD follows the
RFC2581/RFC5681 4.1 (the paragraph before the "SHOULD").  IMHO, that's
probably why the author in the following email requestioned about the
implementation of "SHOULD" in RFC2581/RFC5681.


http://www.ietf.org/mail-archive/web/tcpm/current/msg02864.html

So if we were to implement RFC 2861, the new socket option would be equivalent
to setting Linux's 'tcp_slow_start_after_idle' to false, but on a per-socket
basis rather than globally.


Agree, per-socket option could be useful than global sysctls under
certain situation.  However, in addition to the per-socket option,
could global sysctl nodes to disable idle_restart/idle_cwv help too?


No.  This is far too dangerous once it makes it into some tuning guide.
The threat of congestion breakdown is real.  The Internet, or any packet
network, can only survive in the long term if almost all follow the rules
and self-constrain to remain fair to the others.  What would happen if
nobody would respect the traffic lights anymore?

Besides that bursting into unknown network conditions is very likely to
result in burst losses as well.  TCP isn't good at recovering from it.
In the end you most likely come out ahead if you decay the restartCWND.

We have two cases primarily: a) long distance, medium to high RTT, and
wildly varying bandwidth (a.k.a. the Internet); b) short distance, low
RTT and mostly plenty of bandwidth (a.k.a. Datacenter).  The former
absolutely definately requires a decayed restartCWND.  The latter less
so but even there bursting at 10Gig TSO assisted wirespeed isn't going
to end too happy more often than not.

Since this seems to be a burning issue I'll come up with a patch in the
next days to add a decaying restartCWND that'll be fair and allow a very
quick ramp up if no loss occurs.

--
Andre

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: [PATCH] Add a new TCP_IGNOREIDLE socket option

Reply via email to