On 1/22/13 12:11 PM, John Baldwin wrote:
As I mentioned in an earlier thread, I recently had to debug an issue we were
seeing across a link with a high bandwidth-delay product (both high bandwidth
and high RTT). Our specific use case was to use a TCP connection to reliably
forward a latency-sensitive datagram stream across a WAN connection. We would
often see spikes in the latency of individual datagrams. I eventually tracked
this down to the connection entering slow start when it would transmit data
after being idle. The data stream was quite bursty and would often attempt to
transmit a burst of data after being idle for far longer than a retransmit
timeout.
In 7.x we had worked around this in the past by disabling RFC 3390 and jacking
the slow start window size up via a sysctl. On 8.x this no longer worked.
The solution I came up with was to add a new socket option to disable idle
handling completely. That is, when an idle connection restarts with this new
option enabled, it keeps its current congestion window and doesn't enter slow
start.
There are only a few cases where such an option is useful, but if anyone else
thinks this might be useful I'd be happy to add the option to FreeBSD.
This looks good, but it almost sounds like a bug for TCP to be doing
this anyhow.
Why would one want this behavior?
Wouldn't it make sense to keep the window large until there was a
problem rather than unconditionally chop it down? I almost think TCP is
afraid that you might wind up swapping out a 10gig interface for a
modem? I'm just not getting it. (probably simple oversight on my part).
What do you think about also making this a sysctl for global on/off by
default?
-Alfred
Index: share/man/man4/tcp.4
===================================================================
--- share/man/man4/tcp.4 (revision 245742)
+++ share/man/man4/tcp.4 (working copy)
@@ -205,6 +205,18 @@
in the
.Sx MIB Variables
section further down.
+.It Dv TCP_IGNOREIDLE
+If a TCP connection is idle for more than one retransmit timeout,
+it enters slow start when new data is available to transmit.
+This avoids flooding the network with a full window of traffic at line rate.
+It also allows the connection to adjust to changes to network conditions
+that occurred while the connection was idle. A connection that sends
+bursts of data separated by large idle periods can be permamently stuck in
+slow start as a result.
+The boolean option
+.Dv TCP_IGNOREIDLE
+disables the idle connection handling allowing connections to maintain the
+existing congestion window when restarting after an idle period.
.It Dv TCP_NODELAY
Under most circumstances,
.Tn TCP
Index: sys/netinet/tcp_var.h
===================================================================
--- sys/netinet/tcp_var.h (revision 245742)
+++ sys/netinet/tcp_var.h (working copy)
@@ -230,6 +230,7 @@
#define TF_NEEDFIN 0x000800 /* send FIN (implicit state) */
#define TF_NOPUSH 0x001000 /* don't push */
#define TF_PREVVALID 0x002000 /* saved values for bad rxmit
valid */
+#define TF_IGNOREIDLE 0x004000 /* connection is never idle */
#define TF_MORETOCOME 0x010000 /* More data to be appended to
sock */
#define TF_LQ_OVERFLOW 0x020000 /* listen queue overflow */
#define TF_LASTIDLE 0x040000 /* connection was previously
idle */
Index: sys/netinet/tcp_output.c
===================================================================
--- sys/netinet/tcp_output.c (revision 245742)
+++ sys/netinet/tcp_output.c (working copy)
@@ -206,7 +206,8 @@
* to send, then transmit; otherwise, investigate further.
*/
idle = (tp->t_flags & TF_LASTIDLE) || (tp->snd_max == tp->snd_una);
- if (idle && ticks - tp->t_rcvtime >= tp->t_rxtcur)
+ if (!(tp->t_flags & TF_IGNOREIDLE) &&
+ idle && ticks - tp->t_rcvtime >= tp->t_rxtcur)
cc_after_idle(tp);
tp->t_flags &= ~TF_LASTIDLE;
if (idle) {
Index: sys/netinet/tcp.h
===================================================================
--- sys/netinet/tcp.h (revision 245823)
+++ sys/netinet/tcp.h (working copy)
@@ -156,6 +156,7 @@
#define TCP_NODELAY 1 /* don't delay send to coalesce packets
*/
#if __BSD_VISIBLE
#define TCP_MAXSEG 2 /* set maximum segment size */
+#define TCP_IGNOREIDLE 3 /* disable idle connection handling */
#define TCP_NOPUSH 4 /* don't push last block of write */
#define TCP_NOOPT 8 /* don't use TCP options */
#define TCP_MD5SIG 16 /* use MD5 digests (RFC2385) */
Index: sys/netinet/tcp_usrreq.c
===================================================================
--- sys/netinet/tcp_usrreq.c (revision 245742)
+++ sys/netinet/tcp_usrreq.c (working copy)
@@ -1354,6 +1354,7 @@
case TCP_NODELAY:
case TCP_NOOPT:
+ case TCP_IGNOREIDLE:
INP_WUNLOCK(inp);
error = sooptcopyin(sopt, &optval, sizeof optval,
sizeof optval);
@@ -1368,6 +1369,9 @@
case TCP_NOOPT:
opt = TF_NOOPT;
break;
+ case TCP_IGNOREIDLE:
+ opt = TF_IGNOREIDLE;
+ break;
default:
opt = 0; /* dead code to fool gcc */
break;
@@ -1578,6 +1582,11 @@
INP_WUNLOCK(inp);
error = sooptcopyout(sopt, buf, TCP_CA_NAME_MAX);
break;
+ case TCP_IGNOREIDLE:
+ optval = tp->t_flags & TF_IGNOREIDLE;
+ INP_WUNLOCK(inp);
+ error = sooptcopyout(sopt, &optval, sizeof optval);
+ break;
default:
INP_WUNLOCK(inp);
error = ENOPROTOOPT;
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"