Re: Should syncache.count ever be negative?
On Fri, 9 Nov 2007, Matt Reimer wrote: Ok, I've run netperf in both directions. The box I've been targeting is 66.230.193.105 aka wordpress1. Ok, at least that looks good. The machine is a Dell 1950 with 8 x 1.6GHz Xeon 5310s, 8G RAM, and this NIC: Nice. I first noticed this problem running ab; then to simplify I used netrate/http[d]. What's strange is that it seems fine over the local network (~15800 requests/sec), but it slowed down dramatically (~150 req/sec) when tested from another network 20 ms away. Running systat -tcp and nload I saw that there was an almost complete stall with only a handful of packets being sent (probably my ssh packets) for a few seconds or sometimes even up to 60 seconds or so. I think most benchmarking tools end up stalling if all of their threads stall, that may be why the rate falls off after the misbehavior you describe below begins. Nov 9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to [66.230.193.105]:80; syncache_socket: Socket create failed due to limits or memory shortage Nov 9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to [66.230.193.105]:80 tcpflags 0x10; tcp_input: Listen socket: Socket allocation failed due to limits or memory shortage, sending RST Turns out you'll generally get both of those error messages together, from my reading of the code. Since you eliminated memory shortage in the socket zone, the next thing to check is the length of the listen queues. If the listen queue is backing up because the application isn't accepting fast enough, the errors above should happen. "netstat -Lan" should show you what's going on there. Upping the specified listen queue length in your webserver _may_ be all that is necessary. Try fiddling with that and watching how much they're filling up during testing. The fact that you see the same port repeatedly may indicate that the syncache isn't destroying the syncache entries when you get the socket creation failure. Take a look at "netstat -n" and look for SYN_RECEIVED entries - if they're sticking around for more than a few seconds, this is probably what's happening. (This entire paragraph is speculation, but worth investigating.) I don't know if it's relevant, but accf_http is loaded on wordpress1. That may be relevant - accepting filtering changes how the listen queues are used. Try going back to non-accept filtering for now. We have seen similar behavior (TCP slowdowns) on a different machines (4 x Xeon 5160) with a different NIC (em0) running RELENG_7, though I haven't diagnosed it to this level of detail. All our RELENG_6 and RELENG_4 machines seem fine. em is the driver that I was having issues with when it shared an interrupt... :) FWIW, my crazy theory of the moment is this: We have some bug that happens when the listen queues overflow in 7.0, and your test is strenuous enough to hit the listen queue overflow condition, leading to total collapse. I'll have to cobble together a test program to see what happens in the listen queue overflow case. Thanks for the quick feedback, -Mike ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Should syncache.count ever be negative?
On Nov 10, 2007 12:13 AM, Mike Silbersack <[EMAIL PROTECTED]> wrote: > > On Fri, 9 Nov 2007, Matt Reimer wrote: > > > I first noticed this problem running ab; then to simplify I used > > netrate/http[d]. What's strange is that it seems fine over the local > > network (~15800 requests/sec), but it slowed down dramatically (~150 > > req/sec) when tested from another network 20 ms away. Running systat > > -tcp and nload I saw that there was an almost complete stall with only > > a handful of packets being sent (probably my ssh packets) for a few > > seconds or sometimes even up to 60 seconds or so. > > I think most benchmarking tools end up stalling if all of their threads > stall, that may be why the rate falls off after the misbehavior you > describe below begins. Ok. FWIW, I'm seeing the same behavior with tools/netrate/http as I am with ab. > > Nov 9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to > > [66.230.193.105]:80; syncache_socket: Socket create failed due to > > limits or memory shortage > > Nov 9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to > > [66.230.193.105]:80 tcpflags 0x10; tcp_input: Listen socket: > > Socket allocation failed due to limits or memory shortage, sending RST > > Turns out you'll generally get both of those error messages together, from > my reading of the code. > > Since you eliminated memory shortage in the socket zone, the next thing to > check is the length of the listen queues. If the listen queue is backing > up because the application isn't accepting fast enough, the errors above > should happen. "netstat -Lan" should show you what's going on there. > Upping the specified listen queue length in your webserver _may_ be all > that is necessary. Try fiddling with that and watching how much they're > filling up during testing. I ran "netstat -Lan" every second while running this test and the output never changed from the following, whether before or after the stall: Current listen queue sizes (qlen/incqlen/maxqlen) Proto Listen Local Address tcp4 0/0/12866.230.193.105.80 tcp4 0/0/10 127.0.0.1.25 tcp4 0/0/128*.22 tcp4 0/0/128*.199 > The fact that you see the same port repeatedly may indicate that the > syncache isn't destroying the syncache entries when you get the socket > creation failure. Take a look at "netstat -n" and look for SYN_RECEIVED > entries - if they're sticking around for more than a few seconds, this is > probably what's happening. (This entire paragraph is speculation, but > worth investigating.) During the stall the sockets are all in TIME_WAIT. More relevant info: kern.ipc.maxsockets: 12328 kern.ipc.numopensockets: 46 net.inet.ip.portrange.randomtime: 45 net.inet.ip.portrange.randomcps: 10 net.inet.ip.portrange.randomized: 1 net.inet.ip.portrange.reservedlow: 0 net.inet.ip.portrange.reservedhigh: 1023 net.inet.ip.portrange.hilast: 65535 net.inet.ip.portrange.hifirst: 49152 net.inet.ip.portrange.last: 65535 net.inet.ip.portrange.first: 3 net.inet.ip.portrange.lowlast: 600 net.inet.ip.portrange.lowfirst: 1023 net.inet.tcp.finwait2_timeout: 6 net.inet.tcp.fast_finwait2_recycle: 0 [EMAIL PROTECTED] /sys/dev]# netstat -m 513/5382/5895 mbufs in use (current/cache/total) 511/3341/3852/25600 mbuf clusters in use (current/cache/total/max) 1/1663 mbuf+clusters out of packet secondary zone in use (current/cache) 0/488/488/0 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/0 9k jumbo clusters in use (current/cache/total/max) 0/0/0/0 16k jumbo clusters in use (current/cache/total/max) 1150K/9979K/11129K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 17 requests for I/O initiated by sendfile 0 calls to protocol drain routines > > I don't know if it's relevant, but accf_http is loaded on wordpress1. > > That may be relevant - accepting filtering changes how the listen queues > are used. Try going back to non-accept filtering for now. It still stalls. This time I noticed that tcptw shows 0 free: socket: 696,12330, 14, 126,10749,0 unpcb:248,12330,5, 70, 75,0 ipq: 56, 819,0,0,0,0 udpcb:280,12334,2, 40, 184,0 inpcb:280,12334, 2485, 105,10489,0 tcpcb:688,12330,7, 73,10489,0 tcptw: 88, 2478, 2478,0, 2478, 7231 syncache: 112,15378,1, 65, 9713,0 hostcache:136,15372,0,0,0,0 tcpreass: 40, 1680,
Re: Should syncache.count ever be negative?
On Sat, 10 Nov 2007, Mike Silbersack wrote: FWIW, my crazy theory of the moment is this: We have some bug that happens when the listen queues overflow in 7.0, and your test is strenuous enough to hit the listen queue overflow condition, leading to total collapse. I'll have to cobble together a test program to see what happens in the listen queue overflow case. Post testing, I have a different theory. Can you also try sysctl net.inet.tcp.syncookies=0 I modified netrate's httpd to sleep a lot and found an interesting behavior between listen queue overflows and syncookies: 04:28:21.470931 IP 10.1.1.8.50566 > 10.1.1.6.http: S 287310302:287310302(0) win 32768 04:28:21.470939 IP 10.1.1.6.http > 10.1.1.8.50566: S 4209413098:4209413098(0) ack 287310303 win 65535 04:28:21.473487 IP 10.1.1.8.50566 > 10.1.1.6.http: . ack 1 win 33304 04:28:21.473493 IP 10.1.1.6.http > 10.1.1.8.50566: R 4209413099:4209413099(0) win 0 04:28:21.473642 IP 10.1.1.8.50566 > 10.1.1.6.http: P 1:78(77) ack 1 win 33304 04:28:21.482555 IP 10.1.1.6.http > 10.1.1.8.50566: P 1:126(125) ack 78 win 8326 04:28:21.482563 IP 10.1.1.6.http > 10.1.1.8.50566: F 126:126(0) ack 78 win 8326 04:28:21.487047 IP 10.1.1.8.50566 > 10.1.1.6.http: R 287310380:287310380(0) win 0 04:28:21.487398 IP 10.1.1.8.50566 > 10.1.1.6.http: R 287310380:287310380(0) win 0 The listen queue overflow causes the socket to be closed and a RST sent, but the next packet from 10.1.1.8 crosses it on the wire and activates the syncookie code, reopening the connection. Meanwhile, the RST arrives at 10.1.1.8 and closes its socket, leading to it sending RSTs when the data from 10.1.1.6 arrives. Not sure if that's your problem or not, but it's interesting. -Mike ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Should syncache.count ever be negative?
On Sat, 10 Nov 2007, Matt Reimer wrote: I ran "netstat -Lan" every second while running this test and the output never changed from the following, whether before or after the stall: I forgot to mention, check netstat -s for listen queue overflows. During the stall the sockets are all in TIME_WAIT. More relevant info: In the past that was not a problem, but I should retest this as well. It still stalls. This time I noticed that tcptw shows 0 free: The tcptw zone is supposed to fill completely, then kick out the oldest entry whenever a new one comes in. So, that sounds ok to me... but like I said, I need to retest that too. When I use ab I'm telling it to use a max of 100 simultaneous connections (ab -c 100 -n 5 http://66.230.193.105/). Wouldn't that be well under the limit? Yep, should be. Hmph. -Mike ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: if_stf and rfc1918
Hajimu UMEMOTO freebsd.org> writes: > Lukasz> after the packets leave my site they are completly valid 6to4 packets. > Lukasz> Also when 6to4 packets come to me they are handeled properly. > Oops, I completely forget this issue. If there is no objection, I'll > commit following patch into HEAD then MFC to RELENG_5. What about that commit? 0=) I used the patch for a few months now, with complete success, and forgetting to re-apply it on every single buildkernel... having it in the mainline would help people with forgetful minds ;) Lapo ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
System Freezes When MBufClust Usages Rises
We are using FreeBSD to run the Dante SOCKS proxy server to accelerate a high-latency (approximately 1-second round-trip) network link. We need to support many concurrent transfers of large files. To do this, we have set the machine up with the following parameters. Compiled Dante with the following setting in include/config.h SOCKD_BUFSIZETCP = (1024*1000) /etc/sysctl.conf : kern.ipc.maxsockbuf=4194304 net.inet.tcp.sendspace=2097152 net.inet.tcp.recvspace=2097152 /boot/loader.conf : kern.ipc.maxsockets="0" (also tried 25600, 51200, 102400, and 409600) kern.ipc.nmbclusters="0" (also tried 102400 and 409600) (Looking at the code, it seems that 0 means not to set a max for the above two controls.) If kern.ipc.nmbclusters is set to 25600, the system will hard freeze when "vmstat -z" shows the number of clusters reaches 25600. If kern.ipc.nmbclusters is set to 0 (or 102400), the system will hard freeze when "vmstat -z" shows the number of clusters is around 66000. When it freezes, the number of Kbytes allocated to network (as shown by "netstat -m") is roughly 160,000 (160MB). For a while, we thought that there may be a limit of 65536 mbuf clusters, so we tested building the kernel with MCLSHIFT=12, which makes each mbcluster 4096-bytes. With this configuration, nmbclusters only reached about 33000 before the system froze. The number of Kbytes allocated to network (as shown by "netstat -m") still maxed out at around 160,000. Now, it seems that we are running into some other memory limitation that occurs when our network allocation gets close to 160MB. We have tried tuning paramaters such as KVA_PAGES, vm.kmem_size, vm.kmem_size_max, etc. Though, we are unsure if the mods we made there helped in any way. This is all being done on Celeron 2.8GHz machines with 3+ GB of RAM running FreeBSD 5.3. We are very much tied to this platform at the moment, and upgrading is not a realistic option for us. We would like to tune the systems to not lockup. We can currently work around the problem (by using smaller buffers and such), but it is at the expense of network throughput, which is less than ideal. Are there any other parameters that would help us to allocate more memory to the kernel networking? What other options should we look into? Thanks, Ed Mandy ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Merging rc.d/network_ipv6 into rc.d/netif
On Mon, 5 Nov 2007, Bob Johnson wrote: On 11/5/07, Mike Makonnen <[EMAIL PROTECTED]> wrote: Most IP related knobs will have an ipv4_ and ipv6_ version. To make the transition easier rc.subr(8) will "automagically" DTRT for the following knobs: gateway_enable => ipv4_gateway_enable router_enable => ipv4_router_enable router => ipv4_router router_flags => ipv4_router_flags defaultrouter => ipv4_defaultrouter static_routes => ipv4_static_routes static_routes_ => ipv4_static_routes_ route_=> ipv4_route_ dhclient_program => ipv4_dhclient_program dhclient_flags => ipv4_dhclient_flags dhclient_flags_ => ipv4_dhclient_flags_ background_dhclient_ => ipv4_background_dhclient_ Please try it and let me know what you think. Personally, I'd prefer the new names be along the lines of ifconfig__ipv4, ifconfig__ipv6, defaultrouter_ipv4, defaultrouter_ipv6, dhclient_program_ipv4, dhclient_program_ipv6, etc. Personally I think that grouping things by ipv4/ipv6 makes more sense, and has better longevity. And this would be a good time to change defaultrouter to default_router! Or we could make it shorter and call it gateway. Doug -- This .signature sanitized for your protection ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"