Re: The tale of a TCP bug

2011-03-26 Thread Stefan `Sec` Zehl
Hi again,

On Fri, Mar 25, 2011 at 16:40 -0400, John Baldwin wrote:
> Reading some more.  I'm trying to understand the breakage in your case.
> 
> You are saying that FreeBSD is the sender, who has data to send, yet is not 
> sending any window probes because it never starts the persist timer when the 
> initial window is zero?  Is that correct?

Yes. The receiver never sends a window update on its own, but when
probed will "admit" to a bigger window.

> And the problem is that the code that uses 'adv' to determine if it
> sound send a window update to the remote end is falsely succeeding due
> to the overflow causing tcp_output() to 'goto send' but that it then
> fails to send any data because it thinks the remote window is full?

Yes, as far as I remember (I did that part of debugging 2 Months ago,
when I submitted the PR %-) that's what happens.

> So one thing I don't quite follow is how you are having rcv_nxt >
> rcv_adv.  I saw this when the other side would send a window probe,
> and then the receiving side would take the -1 remaining window and
> explode it into the maximum window size when it ACKd.

No, it's not rcv_nxt > rcv_adv. It's

(rcv_adv - rcv_nxt) > min(recwin, (long)TCP_MAXWIN << tp->rcv_scale)

My sample case has (rcv_adv - rcv_nxt) = 65536, but 
(TCP_MAXWIN << tp->rcv_scale) = 65535 (as there is no window scaling in
effect)

> Are you seeing the other end of the connection send a window probe, but 
> FreeBSD is not setting the persist timer so that it will send its own window 
> probes?

No, the dump looks like this:

| 10.42.0.25.44852 > 10.42.0.2.1516: Flags [S], 
|seq 3339144437, win 65535, options [...], length 0

FreeBSD sending the first SYN.
[rcv_adv=0, rcv_nxt=0]

| 10.42.0.2.1516 > 10.42.0.25.44852: Flags [S.], 
|seq 42, ack 3339144438, win 0, length 0

The other end SYN|ACKing with a window size of 0.

| 10.42.0.25.44852 > 10.42.0.2.1516: Flags [.], 
|seq 1, ack 1, win 65535, length 0

FreeBSD ACKing, and (correctly) sending no data.
[rcv_adv=67779, rcv_nxt=43], thus resulting in adv=-1/0x

At this point amd64 hangs 'forever' as the opposite side doesn't send
any packets on its own.

On i386 the persist timer is started, and we get:

| 10.42.0.25.44852 > 10.42.0.2.1516: Flags [.],
|seq 1:2, ack 1, win 65535, length 1

A window probe [a few seconds later]

| 10.42.0.2.1516 > 10.42.0.25.44852: Flags [.],
|seq 1, ack 2, win 70, length 0

At which point the remote side admits to having the window open
which results in the connection working fine after that.

CU,
Sec
-- 
I know that you believe that you understand what you think I said.
But I am not sure you realize, that what you heared is not what i meant.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: The tale of a TCP bug

2011-03-26 Thread Stefan `Sec` Zehl
Hi,

> On Fri, Mar 25, 2011 at 16:40 -0400, John Baldwin wrote:
> > And the problem is that the code that uses 'adv' to determine if it
> > sound send a window update to the remote end is falsely succeeding due
> > to the overflow causing tcp_output() to 'goto send' but that it then
> > fails to send any data because it thinks the remote window is full?

On a whim I wanted to find out, how often that overflow is triggered in
normal operation, and whipped up a quick counter-sysctl.

--- sys/netinet/tcp_output.c.org2011-01-04 19:27:00.0 +0100
+++ sys/netinet/tcp_output.c2011-03-26 18:49:30.0 +0100
@@ -87,6 +87,11 @@
 extern struct mbuf *m_copypack();
 #endif
 
+VNET_DEFINE(int, adv_neg) = 0;
+SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, adv_neg, CTLFLAG_RD,
+   &VNET_NAME(adv_neg), 1,
+   "How many times adv got negative");
+
 VNET_DEFINE(int, path_mtu_discovery) = 1;
 SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, path_mtu_discovery, CTLFLAG_RW,
&VNET_NAME(path_mtu_discovery), 1,
@@ -573,6 +578,10 @@
long adv = min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) -
(tp->rcv_adv - tp->rcv_nxt);
 
+   if(min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) <
+   (tp->rcv_adv - tp->rcv_nxt))
+   adv_neg++;
+
if (adv >= (long) (2 * tp->t_maxseg))
goto send;
if (2 * adv >= (long) so->so_rcv.sb_hiwat)

I booted my main (web/shell) box with (only) this patch:

11:36PM  up  3:50, 1 user, load averages: 2.29, 1.51, 0.73
net.inet.tcp.adv_neg: 2466

That's approximately once every 5 seconds. That's way more often than I
suspected.

CU,
Sec
-- 
I  wish  there was a knob on the TV to turn up the intelligence.
There's a knob called "brightness", but it doesn't seem to work. 
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Questions on LRO and Delayed ACK

2011-03-26 Thread David Somayajulu
Hi All,


1.   If there is hardware support for LRO, (where the hardware delivers 
coalesces a bunch of consecutive TCP segments into one large TCP Segment), is 
it enough for the driver to simply post the segment to the host stack via 
ifp->if_input() ? I mean is there a need to run thru tcp_lro_rx() followed by 
tcp_lro_flush().



2.   What kind performance improvement does one get using soft lro via 
tcp_lro_init(); tcp_lro_rx();tcp_lro_flush();


3.   In the absence of LRO, is there any way that one can increase the 
number of inbound frames for which an ACK is transmitted to a value greater 
than 2?

Thanks
david S.


This message and any attached documents contain information from QLogic 
Corporation or its wholly-owned subsidiaries that may be confidential. If you 
are not the intended recipient, you may not read, copy, distribute, or use this 
information. If you have received this transmission in error, please notify the 
sender immediately by reply e-mail and then delete this message.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"