David Malone wrote:
I've got an example extract tcpdump of this at the end of the mail
- here 6 ACKs are sent, 5 of which are pure window updates and
several are 2us apart!

I think the easy option is to delete the code that generates explicit
window updates if the window moves by 2*MSS. We then should be doing
something similar to Linux. The other easy alternative would be to
add a sysclt that lets us generate an window update every N*MSS and
by default set it to something big, like 10 or 100. That should
effectively eliminate the updates during bulk data transfer, but
may still generate some window updates after a loss.

The main problem of the pure window update test in tcp_output() is
its complete ignorance of delayed ACKs.  Second is the strict 4.4BSD
adherence to sending an update for every window increase of >= 2*MSS.
The third issue of sending a slew of window updates after having
received a FIN (telling us the other end won't ever send more data)
I have already fixed some moons ago.

In my new-tcp work I've come across the window update logic some time
ago and backchecked with relevant RFCs and other implementations.
Attached is a compiling but otherwise untested backport of the new logic.

--
Andre

Normal ACKing for driving congestion control shouldn't be impacted
by either of these suggested changes.

        David.

1227622713.276609 172.16.2.2.5002 > 172.16.1.51.39077: . ack 144097745 win 40798 
<nop,nop,timestamp 5365425 5103763> (DF)
1227622713.276611 172.16.2.2.5002 > 172.16.1.51.39077: . ack 144097745 win 40830 
<nop,nop,timestamp 5365425 5103763> (DF)
1227622713.276613 172.16.2.2.5002 > 172.16.1.51.39077: . ack 144097745 win 40862 
<nop,nop,timestamp 5365425 5103763> (DF)
1227622713.276615 172.16.2.2.5002 > 172.16.1.51.39077: . ack 144097745 win 40894 
<nop,nop,timestamp 5365425 5103763> (DF)
1227622713.276852 172.16.2.2.5002 > 172.16.1.51.39077: . ack 144097745 win 40926 
<nop,nop,timestamp 5365425 5103763> (DF)
1227622713.276855 172.16.2.2.5002 > 172.16.1.51.39077: . ack 144097745 win 40958 
<nop,nop,timestamp 5365425 5103763> (DF)
1227622713.296585 172.16.2.2.5002 > 172.16.1.51.39077: . ack 144100641 win 40956 
<nop,nop,timestamp 5365445 5103766> (DF)

Index: tcp_output.c
===================================================================
RCS file: /home/ncvs/src/sys/netinet/tcp_output.c,v
retrieving revision 1.158
diff -u -p -r1.158 tcp_output.c
--- tcp_output.c        27 Nov 2008 13:19:42 -0000      1.158
+++ tcp_output.c        30 Nov 2008 22:58:46 -0000
@@ -539,14 +539,32 @@ after_sack_rexmit:
        }
 
        /*
-        * Compare available window to amount of window
-        * known to peer (as advertised window less
-        * next expected input).  If the difference is at least two
-        * max size segments, or at least 50% of the maximum possible
-        * window, then want to send a window update to peer.
+        * Compare available window to amount of window known to peer
+        * (as advertised window less next expected input) and decide
+        * if we have to send a pure window update segment.
+        *
+        * When a delayed ACK is scheduled, do nothing.  It will update
+        * the window anyway in a few milliseconds.
+        *
+        * If the receive socket buffer has less than 1/4 of space
+        * available and if the difference is at least two max size
+        * segments, send an immediate window update to peer.
+        *
+        * Otherwise if the difference is 1/8 (or more) of the receive
+        * socket buffer, or at least 1/2 of the maximum possible window,
+        * then we send a window update too.
+        *
         * Skip this if the connection is in T/TCP half-open state.
         * Don't send pure window updates when the peer has closed
         * the connection and won't ever send more data.
+        *
+        * See RFC793, Section 3.7, page 43, Window Management Suggestions
+        * See RFC1122: Section 4.2.3.3, When to Send a Window Update
+        *
+        * Note: We are less aggressive with sending window update than
+        * recommended in RFC1122.  This is fine with todays large socket
+        * buffers and will not stall the peer.  In addition we piggy back
+        * window update on regular ACKs and sends.
         */
        if (recwin > 0 && !(tp->t_flags & TF_NEEDSYN) &&
            !TCPS_HAVERCVDFIN(tp->t_state)) {
@@ -554,14 +572,24 @@ after_sack_rexmit:
                 * "adv" is the amount we can increase the window,
                 * taking into account that we are limited by
                 * TCP_MAXWIN << tp->rcv_scale.
+                *
+                * NB: adv must be equal or larger than the smallest
+                * unscaled window increment.
                 */
                long adv = min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) -
                        (tp->rcv_adv - tp->rcv_nxt);
 
-               if (adv >= (long) (2 * tp->t_maxseg))
-                       goto send;
-               if (2 * adv >= (long) so->so_rcv.sb_hiwat)
-                       goto send;
+               if (!(tp->t_flags & TF_DELACK) &&
+                   adv >= (long)0x1 << tp->rcv_scale) {
+                       if (recwin <= (long)(so->so_rcv.sb_hiwat / 4) &&
+                           adv >= (long)(2 * tp->t_maxseg))
+                               goto send;
+                       if (adv >= (long)(so->so_rcv.sb_hiwat / 8) &&
+                           adv >= (long)tp->t_maxseg)
+                               goto send;
+                       if (2 * adv >= (long)so->so_rcv.sb_hiwat)
+                               goto send;
+               }
        }
 
        /*
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to