Venkat Venkatsubra wrote:
Hi Andre,
When delayed Ack is set the window update is not sent.
Does this mean when odd number of packets are received and later read,
a window update won't go out either till the next segment arrives or
200 msecs delayed ack timer ? Can this reduced window block the sender from
sending the next segment that we are waiting for to open up the window ?
Yes. The very idea of delayed ACK is to reduce the network utilization
by ACKing only every other segment. Window updates should not override
this as they currently do. Nagle comes into plays as well where we wait
for the application to write something within the delayed ACK timeout to
piggyback the answer together with the ACK (and window update).
To answer your question: I do think we are fine with waiting for the
delayed ACK. If an application starts to seriously lag behind like
in your example the feedback mechanism should work and cause the sender
to slow down too. The feedback loop in TCP is not only the network but
also the sending and receiving application. In a normal bulk transfer
where the receiving application services the receive buffer in regular
intervals we update the window with every ACK.
I'm open to other ideas if they fix the problem David is seeing without
having more serious shortcomings.
What's the purpose of the 2 MSS check by the way ?
This is part of the Silly Window Syndrome prevention. A good description is
here:
http://www.tcpipguide.com/free/t_TCPSillyWindowSyndromeandChangesTotheSlidingWindow.htm
PS: Attached is an updated version of the patch. The flag TF_DELACK
can't be used to test for the presence of a delayed ACK. The presence
of the delack timer has to be tested.
--
Andre
Venkat
________________________________
From: Andre Oppermann <[EMAIL PROTECTED]>
To: David Malone <[EMAIL PROTECTED]>
Cc: Rui Paulo <[EMAIL PROTECTED]>; freebsd-net@freebsd.org; Venkat Venkatsubra <[EMAIL
PROTECTED]>; Kevin Oberman <[EMAIL PROTECTED]>
Sent: Sunday, November 30, 2008 5:18:22 PM
Subject: Re: FreeBSD Window updates
Andre Oppermann wrote:
David Malone wrote:
I've got an example extract tcpdump of this at the end of the mail
- here 6 ACKs are sent, 5 of which are pure window updates and
several are 2us apart!
I think the easy option is to delete the code that generates explicit
window updates if the window moves by 2*MSS. We then should be doing
something similar to Linux. The other easy alternative would be to
add a sysclt that lets us generate an window update every N*MSS and
by default set it to something big, like 10 or 100. That should
effectively eliminate the updates during bulk data transfer, but
may still generate some window updates after a loss.
The main problem of the pure window update test in tcp_output() is
its complete ignorance of delayed ACKs. Second is the strict 4.4BSD
adherence to sending an update for every window increase of >= 2*MSS.
The third issue of sending a slew of window updates after having
received a FIN (telling us the other end won't ever send more data)
I have already fixed some moons ago.
In my new-tcp work I've come across the window update logic some time
ago and backchecked with relevant RFCs and other implementations.
Attached is a compiling but otherwise untested backport of the new logic.
Slightly improved version attached.
Index: tcp_output.c
===================================================================
RCS file: /home/ncvs/src/sys/netinet/tcp_output.c,v
retrieving revision 1.158
diff -u -p -r1.158 tcp_output.c
--- tcp_output.c 27 Nov 2008 13:19:42 -0000 1.158
+++ tcp_output.c 1 Dec 2008 21:06:28 -0000
@@ -539,29 +539,59 @@ after_sack_rexmit:
}
/*
- * Compare available window to amount of window
- * known to peer (as advertised window less
- * next expected input). If the difference is at least two
- * max size segments, or at least 50% of the maximum possible
- * window, then want to send a window update to peer.
+ * Compare available window to amount of window known to peer
+ * (as advertised window less next expected input) and decide
+ * if we have to send a pure window update segment.
+ *
+ * When a delayed ACK is scheduled, do nothing. It will update
+ * the window anyway in a few milliseconds when the:
+ * - next segment arrives we have to ack immediately;
+ * - application sends some data back to the peer;
+ * - delayed ACK timer expires.
+ *
+ * If the receive socket buffer has less than 1/4 of space
+ * available and if the difference is at least two max size
+ * segments, send an immediate window update to peer.
+ *
+ * Otherwise if the difference is 1/8 (or more) of the receive
+ * socket buffer, or at least 1/2 of the maximum possible window,
+ * then we send a window update too.
+ *
* Skip this if the connection is in T/TCP half-open state.
* Don't send pure window updates when the peer has closed
* the connection and won't ever send more data.
+ *
+ * See RFC793, Section 3.7, page 43, Window Management Suggestions
+ * See RFC1122: Section 4.2.3.3, When to Send a Window Update
+ *
+ * Note: We are less aggressive with sending window update than
+ * recommended in RFC1122. This is fine with todays large socket
+ * buffers and will not stall the peer. In addition we piggy back
+ * window update on regular ACKs and sends.
*/
- if (recwin > 0 && !(tp->t_flags & TF_NEEDSYN) &&
- !TCPS_HAVERCVDFIN(tp->t_state)) {
+ if (recwin > 0 && !tcp_timer_active(tp, TT_DELACK) &&
+ !(tp->t_flags & TF_NEEDSYN) && !TCPS_HAVERCVDFIN(tp->t_state)) {
/*
* "adv" is the amount we can increase the window,
* taking into account that we are limited by
* TCP_MAXWIN << tp->rcv_scale.
+ *
+ * NB: adv must be equal or larger than the smallest
+ * unscaled window increment.
*/
long adv = min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) -
(tp->rcv_adv - tp->rcv_nxt);
- if (adv >= (long) (2 * tp->t_maxseg))
- goto send;
- if (2 * adv >= (long) so->so_rcv.sb_hiwat)
- goto send;
+ if (adv >= (long)0x1 << tp->rcv_scale) {
+ if (recwin <= (long)(so->so_rcv.sb_hiwat / 4) &&
+ adv >= (long)(2 * tp->t_maxseg))
+ goto send;
+ if (adv >= (long)(so->so_rcv.sb_hiwat / 8) &&
+ adv >= (long)tp->t_maxseg)
+ goto send;
+ if (2 * adv >= (long)so->so_rcv.sb_hiwat)
+ goto send;
+ }
}
/*
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"