Hi,
We recently upgraded one of our webservers to FreeBSD 7, and we started
receiving complaints from some users not able to connect to that server
anymore. On top of that, users were saying that the problem only occurred
on
Windows (at least, the ones who had more than on OS to try it out).
After managing to get a user who had the problem running windump, running
tcpdump on the new server, and comparing that to the windump & tcpdump
output for a "control" user (me) that could connect, we managed to figure
out the following:
- For the user with this problem, ping works fine, but all TCP
connections
to the server fail.
- The user, trying to connect, sends out a SYN packet, receives no
response,
and retries a few times until timing out.
- The server sees a bunch of SYN packets and responds with SYN-ACK each
time.
- The issue only seems to arise if the sender has RFC1323 disabled.
So, the SYN-ACK is getting lost somewhere.
- For the control user (who can connect via TCP just fine), we set the
TCP
window size and RFC1323 options the same as the user with the problem.
- The control user sees the SYN-ACK packet.
- We send a connection attempt to one of our other servers, running
FreeBSD
5.5, and one to the server running FreeBSD 7.
- There is only one notable difference between the responses: the order
of
the options.
- FreeBSD 5.5 has <mss 1412, nop, nop, sackOK>
- FreeBSD 7 has <mss 1412, sackOK, eol> (there is of course an aligning
nop
after the eol, which tcpdump skips)
- These options don't appear in this exact configuration when using
RFC1323
options.
I get a hunch that the users with the problem have a router that
erroneously
thinks that these options are invalid, or thinks that the some part of
byte
sequence (e.g. 0204 05b4 0101 0402) is an attack.
Just to try it out, I patched tcp_output.c so that the SACK permitted
option
was aligned on a 4-byte boundary, preventing the "sackOK, eol" pattern
from
ever occuring. Looking through previous versions, I found where the tcp
option code had changed, and there used to be a comment about putting
SACK
permitted last, but I can't tell if it's relevant.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_output.c.diff?r1=1.125;r2=1.126
The one-line patch to tcp_output.c is attached.
Sure enough, it fixed the problem. Afterwards, we collected some
information
about the routers the users who had the problem were using, and while
they
didn't all have the same manufacturer, several mentioned that their
router
had a built-in firewall, which, when they disabled it, allowed them to
access the server.
Does all of this sound reasonable? And if so, would it be worth
submitting
this patch? I don't know if this particular change in options order was
intentional, or just a side-effect of the new code, but it certainly
works
around an extremely hard-to-diagnose problem.
-coda
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"