In response to Bill Moran <[EMAIL PROTECTED]>: > In response to Adam McDougall <[EMAIL PROTECTED]>: > > > On Tue, Jun 12, 2007 at 10:19:49AM -0400, Bill Moran wrote: > > > > > > This one has got me pretty befuddled. > > > > We're seeing some really odd behaviour with FreeBSD ignoring SYN packets. > > I've been trying to diagnose this for a couple of weeks now, and my > > current > > guess is that there's something wrong with the em driver. Here's a > > narrowed > > down list of what I've ruled out: > > *) I've done my best to eliminate other network components as the problem. > > My theory at this point is that it can't possibly be any other network > > hardware, based on the tcpdump show below. > > *) The problem occurred on both FreeBSD 6.1 and FreeBSD 6.2-p3. > > *) The problem does not appear to be tied to CPU usage -- the CPU is > > nearly > > idle when the problem occurs. > > *) I can now reproduce it pretty easily, so I'll know when it's fixed. > > *) The system exhibiting the problem is running 15 jails, but they are > > idle 95% of the time. The problem initially occurred inside one of > > the jails, but I just recreated it outside the jail (on the host) and > > it's _easier_ to reproduce outside the jail. > > *) The problem occurred with both GENERIC, and the SMP kernel (this is a > > dual-CPU, hyperthreaded system) > > *) I've tested and the behavior occurs both with a dynamically generated > > file (from PHP) or from a static file. > > > > The nature of the beast is that we've got a SOAP application running under > > Apache and PHP. This application is subject to many requests in rapid > > succession, such that load can be simulated by the following loop: > > > > while true; do fetch http://192.168.121.250/test.php; done > > > > The problem is that occasionally, the Apache server machine just ignores > > SYN packets. Take the following tcpdump output for example: > > > > 13:34:17.312296 IP > > web04-v100.cust00.pitbpa1.priv.collaborativefusion.com.54808 > > > anchor-is00.is.pitbpa1.priv.collaborativefusion.com.http: S > > 2645061726:2645061726(0) win 65535 <mss 1380,nop,wscale 1,nop,nop,timestamp > > 2690201156 0,sackOK,eol> > > 13:34:20.312398 IP > > web04-v100.cust00.pitbpa1.priv.collaborativefusion.com.54808 > > > anchor-is00.is.pitbpa1.priv.collaborativefusion.com.http: S > > 2645061726:2645061726(0) win 65535 <mss 1380,nop,wscale 1,nop,nop,timestamp > > 2690204156 0,sackOK,eol> > > 13:34:23.512626 IP > > web04-v100.cust00.pitbpa1.priv.collaborativefusion.com.54808 > > > anchor-is00.is.pitbpa1.priv.collaborativefusion.com.http: S > > 2645061726:2645061726(0) win 65535 <mss 1380,nop,wscale 1,nop,nop,timestamp > > 2690207356 0,sackOK,eol> > > > > This is the _only_ traffic on port 80 during the test. It looks like the > > kernel has ignored the initial syn packet and two duplicates. I've seen > > it > > take as long as 45 seconds to establish a connection, and this causes > > ugly performance problems, as well as frequent timeouts on the client end. > > The only clue I've found so far is this output from netstat -s. > > > > > > Does the Apache server have a firewall of any sort? (Could be making > > unexpected > > decisions there, even not part of a fw rule) > > > > Try net.inet.ip.portrange.randomized=0 on the client? (If this is the > > problem, > > we would probably see a reused port if you had a tcpdump of a few minutes > > if started after waiting for several minutes of "silence") > > > > Are both systems on the same subnet? If not, can/have you tried that? > > No, they aren't. My ability to test on the same subnet is limited and > the results inconclusive. > > > Can you show tcpdump output using -e on the requests that aren't answered > > as well as an example that IS answered? (I have seen routers mess up the > > MAC > > addresses for the source and destination and if I kept staring at layer 3 > > data all day I might never have seen the problem) > > > > Better yet, can you post files containing tcpdump output using -w of an > > entire > > session that ideally contains failed attempts that eventually work? That > > way > > people could look at a broader picture and perhaps pick up on something > > subtle. > > Its worth comparing a SYN that works, directly with a SYN that doesn't work. > > We've decided to swap the card out on Friday and see if that resolves the > problem. We have similar units that don't exhibit the problem, so I'm > getting pretty suspicious that this might be a flaky NIC. If the new > card doesn't solve the problem, I'll post more details on Monday.
Just in case someone was curious as to the result, or finds this on a web search. The behaviour was apparently hardware related. We swapped the NIC out and can no longer reproduce the problem. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"