On 1/22/13 12:11 PM, John Baldwin wrote:
As I mentioned in an earlier thread, I recently had to debug an issue we were
seeing across a link with a high bandwidth-delay product (both high bandwidth
and high RTT).  Our specific use case was to use a TCP connection to reliably
forward a latency-sensitive datagram stream across a WAN connection.  We would
often see spikes in the latency of individual datagrams.  I eventually tracked
this down to the connection entering slow start when it would transmit data
after being idle.  The data stream was quite bursty and would often attempt to
transmit a burst of data after being idle for far longer than a retransmit
timeout.

In 7.x we had worked around this in the past by disabling RFC 3390 and jacking
the slow start window size up via a sysctl.  On 8.x this no longer worked.
The solution I came up with was to add a new socket option to disable idle
handling completely.  That is, when an idle connection restarts with this new
option enabled, it keeps its current congestion window and doesn't enter slow
start.

There are only a few cases where such an option is useful, but if anyone else
thinks this might be useful I'd be happy to add the option to FreeBSD.

This looks good, but it almost sounds like a bug for TCP to be doing this anyhow.

Why would one want this behavior?

Wouldn't it make sense to keep the window large until there was a problem rather than unconditionally chop it down? I almost think TCP is afraid that you might wind up swapping out a 10gig interface for a modem? I'm just not getting it. (probably simple oversight on my part).

What do you think about also making this a sysctl for global on/off by default?

-Alfred


Index: share/man/man4/tcp.4
===================================================================
--- share/man/man4/tcp.4        (revision 245742)
+++ share/man/man4/tcp.4        (working copy)
@@ -205,6 +205,18 @@
  in the
  .Sx MIB Variables
  section further down.
+.It Dv TCP_IGNOREIDLE
+If a TCP connection is idle for more than one retransmit timeout,
+it enters slow start when new data is available to transmit.
+This avoids flooding the network with a full window of traffic at line rate.
+It also allows the connection to adjust to changes to network conditions
+that occurred while the connection was idle.  A connection that sends
+bursts of data separated by large idle periods can be permamently stuck in
+slow start as a result.
+The boolean option
+.Dv TCP_IGNOREIDLE
+disables the idle connection handling allowing connections to maintain the
+existing congestion window when restarting after an idle period.
  .It Dv TCP_NODELAY
  Under most circumstances,
  .Tn TCP
Index: sys/netinet/tcp_var.h
===================================================================
--- sys/netinet/tcp_var.h       (revision 245742)
+++ sys/netinet/tcp_var.h       (working copy)
@@ -230,6 +230,7 @@
  #define       TF_NEEDFIN      0x000800        /* send FIN (implicit state) */
  #define       TF_NOPUSH       0x001000        /* don't push */
  #define       TF_PREVVALID    0x002000        /* saved values for bad rxmit 
valid */
+#define        TF_IGNOREIDLE   0x004000        /* connection is never idle */
  #define       TF_MORETOCOME   0x010000        /* More data to be appended to 
sock */
  #define       TF_LQ_OVERFLOW  0x020000        /* listen queue overflow */
  #define       TF_LASTIDLE     0x040000        /* connection was previously 
idle */
Index: sys/netinet/tcp_output.c
===================================================================
--- sys/netinet/tcp_output.c    (revision 245742)
+++ sys/netinet/tcp_output.c    (working copy)
@@ -206,7 +206,8 @@
         * to send, then transmit; otherwise, investigate further.
         */
        idle = (tp->t_flags & TF_LASTIDLE) || (tp->snd_max == tp->snd_una);
-       if (idle && ticks - tp->t_rcvtime >= tp->t_rxtcur)
+       if (!(tp->t_flags & TF_IGNOREIDLE) &&
+           idle && ticks - tp->t_rcvtime >= tp->t_rxtcur)
                cc_after_idle(tp);
        tp->t_flags &= ~TF_LASTIDLE;
        if (idle) {
Index: sys/netinet/tcp.h
===================================================================
--- sys/netinet/tcp.h   (revision 245823)
+++ sys/netinet/tcp.h   (working copy)
@@ -156,6 +156,7 @@
  #define       TCP_NODELAY     1       /* don't delay send to coalesce packets 
*/
  #if __BSD_VISIBLE
  #define       TCP_MAXSEG      2       /* set maximum segment size */
+#define        TCP_IGNOREIDLE  3       /* disable idle connection handling */
  #define TCP_NOPUSH    4       /* don't push last block of write */
  #define TCP_NOOPT     8       /* don't use TCP options */
  #define TCP_MD5SIG    16      /* use MD5 digests (RFC2385) */
Index: sys/netinet/tcp_usrreq.c
===================================================================
--- sys/netinet/tcp_usrreq.c    (revision 245742)
+++ sys/netinet/tcp_usrreq.c    (working copy)
@@ -1354,6 +1354,7 @@
case TCP_NODELAY:
                case TCP_NOOPT:
+               case TCP_IGNOREIDLE:
                        INP_WUNLOCK(inp);
                        error = sooptcopyin(sopt, &optval, sizeof optval,
                            sizeof optval);
@@ -1368,6 +1369,9 @@
                        case TCP_NOOPT:
                                opt = TF_NOOPT;
                                break;
+                       case TCP_IGNOREIDLE:
+                               opt = TF_IGNOREIDLE;
+                               break;
                        default:
                                opt = 0; /* dead code to fool gcc */
                                break;
@@ -1578,6 +1582,11 @@
                        INP_WUNLOCK(inp);
                        error = sooptcopyout(sopt, buf, TCP_CA_NAME_MAX);
                        break;
+               case TCP_IGNOREIDLE:
+                       optval = tp->t_flags & TF_IGNOREIDLE;
+                       INP_WUNLOCK(inp);
+                       error = sooptcopyout(sopt, &optval, sizeof optval);
+                       break;
                default:
                        INP_WUNLOCK(inp);
                        error = ENOPROTOOPT;


_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to