Re: Should syncache.count ever be negative?

2007-11-10 Thread Mike Silbersack


On Fri, 9 Nov 2007, Matt Reimer wrote:


Ok, I've run netperf in both directions. The box I've been targeting
is 66.230.193.105 aka wordpress1.


Ok, at least that looks good.


The machine is a Dell 1950 with 8 x 1.6GHz Xeon 5310s, 8G RAM, and this NIC:


Nice.


I first noticed this problem running ab; then to simplify I used
netrate/http[d]. What's strange is that it seems fine over the local
network (~15800 requests/sec), but it slowed down dramatically (~150
req/sec) when tested from another network 20 ms away. Running systat
-tcp and nload I saw that there was an almost complete stall with only
a handful of packets being sent (probably my ssh packets) for a few
seconds or sometimes even up to 60 seconds or so.


I think most benchmarking tools end up stalling if all of their threads 
stall, that may be why the rate falls off after the misbehavior you 
describe below begins.



Nov  9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
[66.230.193.105]:80; syncache_socket: Socket create failed due to
limits or memory shortage
Nov  9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
[66.230.193.105]:80 tcpflags 0x10; tcp_input: Listen socket:
Socket allocation failed due to limits or memory shortage, sending RST


Turns out you'll generally get both of those error messages together, from 
my reading of the code.


Since you eliminated memory shortage in the socket zone, the next thing to 
check is the length of the listen queues.  If the listen queue is backing 
up because the application isn't accepting fast enough, the errors above 
should happen.  "netstat -Lan" should show you what's going on there. 
Upping the specified listen queue length in your webserver _may_ be all 
that is necessary.  Try fiddling with that and watching how much they're 
filling up during testing.


The fact that you see the same port repeatedly may indicate that the 
syncache isn't destroying the syncache entries when you get the socket 
creation failure.  Take a look at "netstat -n" and look for SYN_RECEIVED 
entries - if they're sticking around for more than a few seconds, this is 
probably what's happening.  (This entire paragraph is speculation, but 
worth investigating.)



I don't know if it's relevant, but accf_http is loaded on wordpress1.


That may be relevant - accepting filtering changes how the listen queues 
are used.  Try going back to non-accept filtering for now.



We have seen similar behavior (TCP slowdowns) on a different machines
(4 x Xeon 5160) with a different NIC (em0) running RELENG_7, though I
haven't diagnosed it to this level of detail. All our RELENG_6 and
RELENG_4 machines seem fine.


em is the driver that I was having issues with when it shared an 
interrupt... :)


FWIW, my crazy theory of the moment is this:  We have some bug that 
happens when the listen queues overflow in 7.0, and your test is strenuous 
enough to hit the listen queue overflow condition, leading to total 
collapse.  I'll have to cobble together a test program to see what happens 
in the listen queue overflow case.


Thanks for the quick feedback,

-Mike
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Should syncache.count ever be negative?

2007-11-10 Thread Matt Reimer
On Nov 10, 2007 12:13 AM, Mike Silbersack <[EMAIL PROTECTED]> wrote:
>
> On Fri, 9 Nov 2007, Matt Reimer wrote:
>
> > I first noticed this problem running ab; then to simplify I used
> > netrate/http[d]. What's strange is that it seems fine over the local
> > network (~15800 requests/sec), but it slowed down dramatically (~150
> > req/sec) when tested from another network 20 ms away. Running systat
> > -tcp and nload I saw that there was an almost complete stall with only
> > a handful of packets being sent (probably my ssh packets) for a few
> > seconds or sometimes even up to 60 seconds or so.
>
> I think most benchmarking tools end up stalling if all of their threads
> stall, that may be why the rate falls off after the misbehavior you
> describe below begins.

Ok. FWIW, I'm seeing the same behavior with tools/netrate/http as I am with ab.

> > Nov  9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
> > [66.230.193.105]:80; syncache_socket: Socket create failed due to
> > limits or memory shortage
> > Nov  9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
> > [66.230.193.105]:80 tcpflags 0x10; tcp_input: Listen socket:
> > Socket allocation failed due to limits or memory shortage, sending RST
>
> Turns out you'll generally get both of those error messages together, from
> my reading of the code.
>
> Since you eliminated memory shortage in the socket zone, the next thing to
> check is the length of the listen queues.  If the listen queue is backing
> up because the application isn't accepting fast enough, the errors above
> should happen.  "netstat -Lan" should show you what's going on there.
> Upping the specified listen queue length in your webserver _may_ be all
> that is necessary.  Try fiddling with that and watching how much they're
> filling up during testing.

I ran "netstat -Lan" every second while running this test and the
output never changed from the following, whether before or after the
stall:

Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen Local Address
tcp4  0/0/12866.230.193.105.80
tcp4  0/0/10 127.0.0.1.25
tcp4  0/0/128*.22
tcp4  0/0/128*.199


> The fact that you see the same port repeatedly may indicate that the
> syncache isn't destroying the syncache entries when you get the socket
> creation failure.  Take a look at "netstat -n" and look for SYN_RECEIVED
> entries - if they're sticking around for more than a few seconds, this is
> probably what's happening.  (This entire paragraph is speculation, but
> worth investigating.)

During the stall the sockets are all in TIME_WAIT. More relevant info:

kern.ipc.maxsockets: 12328
kern.ipc.numopensockets: 46

net.inet.ip.portrange.randomtime: 45
net.inet.ip.portrange.randomcps: 10
net.inet.ip.portrange.randomized: 1
net.inet.ip.portrange.reservedlow: 0
net.inet.ip.portrange.reservedhigh: 1023
net.inet.ip.portrange.hilast: 65535
net.inet.ip.portrange.hifirst: 49152
net.inet.ip.portrange.last: 65535
net.inet.ip.portrange.first: 3
net.inet.ip.portrange.lowlast: 600
net.inet.ip.portrange.lowfirst: 1023

net.inet.tcp.finwait2_timeout: 6
net.inet.tcp.fast_finwait2_recycle: 0

[EMAIL PROTECTED] /sys/dev]# netstat -m
513/5382/5895 mbufs in use (current/cache/total)
511/3341/3852/25600 mbuf clusters in use (current/cache/total/max)
1/1663 mbuf+clusters out of packet secondary zone in use (current/cache)
0/488/488/0 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/0 9k jumbo clusters in use (current/cache/total/max)
0/0/0/0 16k jumbo clusters in use (current/cache/total/max)
1150K/9979K/11129K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
17 requests for I/O initiated by sendfile
0 calls to protocol drain routines

> > I don't know if it's relevant, but accf_http is loaded on wordpress1.
>
> That may be relevant - accepting filtering changes how the listen queues
> are used.  Try going back to non-accept filtering for now.

It still stalls. This time I noticed that tcptw shows 0 free:

socket:   696,12330,   14,  126,10749,0
unpcb:248,12330,5,   70,   75,0
ipq:   56,  819,0,0,0,0
udpcb:280,12334,2,   40,  184,0
inpcb:280,12334, 2485,  105,10489,0
tcpcb:688,12330,7,   73,10489,0
tcptw: 88, 2478, 2478,0, 2478, 7231
syncache: 112,15378,1,   65, 9713,0
hostcache:136,15372,0,0,0,0
tcpreass:  40, 1680,   

Re: Should syncache.count ever be negative?

2007-11-10 Thread Mike Silbersack


On Sat, 10 Nov 2007, Mike Silbersack wrote:

FWIW, my crazy theory of the moment is this:  We have some bug that happens 
when the listen queues overflow in 7.0, and your test is strenuous enough to 
hit the listen queue overflow condition, leading to total collapse.  I'll 
have to cobble together a test program to see what happens in the listen 
queue overflow case.


Post testing, I have a different theory.

Can you also try

sysctl net.inet.tcp.syncookies=0

I modified netrate's httpd to sleep a lot and found an interesting 
behavior between listen queue overflows and syncookies:


04:28:21.470931 IP 10.1.1.8.50566 > 10.1.1.6.http: S 287310302:287310302(0) win 32768 

04:28:21.470939 IP 10.1.1.6.http > 10.1.1.8.50566: S 4209413098:4209413098(0) ack 
287310303 win 65535 
04:28:21.473487 IP 10.1.1.8.50566 > 10.1.1.6.http: . ack 1 win 33304 

04:28:21.473493 IP 10.1.1.6.http > 10.1.1.8.50566: R 4209413099:4209413099(0) 
win 0
04:28:21.473642 IP 10.1.1.8.50566 > 10.1.1.6.http: P 1:78(77) ack 1 win 33304 

04:28:21.482555 IP 10.1.1.6.http > 10.1.1.8.50566: P 1:126(125) ack 78 win 8326 

04:28:21.482563 IP 10.1.1.6.http > 10.1.1.8.50566: F 126:126(0) ack 78 win 8326 

04:28:21.487047 IP 10.1.1.8.50566 > 10.1.1.6.http: R 287310380:287310380(0) win 0
04:28:21.487398 IP 10.1.1.8.50566 > 10.1.1.6.http: R 287310380:287310380(0) win 0

The listen queue overflow causes the socket to be closed and a RST sent, 
but the next packet from 10.1.1.8 crosses it on the wire and activates the 
syncookie code, reopening the connection.  Meanwhile, the RST arrives at 
10.1.1.8 and closes its socket, leading to it sending RSTs when the data 
from 10.1.1.6 arrives.


Not sure if that's your problem or not, but it's interesting.

-Mike
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Should syncache.count ever be negative?

2007-11-10 Thread Mike Silbersack


On Sat, 10 Nov 2007, Matt Reimer wrote:


I ran "netstat -Lan" every second while running this test and the
output never changed from the following, whether before or after the
stall:


I forgot to mention, check netstat -s for listen queue overflows.


During the stall the sockets are all in TIME_WAIT. More relevant info:


In the past that was not a problem, but I should retest this as well.


It still stalls. This time I noticed that tcptw shows 0 free:


The tcptw zone is supposed to fill completely, then kick out the oldest 
entry whenever a new one comes in.  So, that sounds ok to me... but like I 
said, I need to retest that too.



When I use ab I'm telling it to use a max of 100 simultaneous
connections (ab -c 100 -n 5 http://66.230.193.105/). Wouldn't that
be well under the limit?


Yep, should be.  Hmph.

-Mike
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: if_stf and rfc1918

2007-11-10 Thread Lapo Luchini
Hajimu UMEMOTO  freebsd.org> writes:

> Lukasz> after the packets leave my site they are completly valid 6to4 packets.
> Lukasz> Also when 6to4 packets come to me they are handeled properly.
> Oops, I completely forget this issue.  If there is no objection, I'll
> commit following patch into HEAD then MFC to RELENG_5.

What about that commit? 0=)

I used the patch for a few months now, with complete success, and forgetting to
re-apply it on every single buildkernel... having it in the mainline would help
people with forgetful minds ;)

Lapo

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


System Freezes When MBufClust Usages Rises

2007-11-10 Thread Ed Mandy
We are using FreeBSD to run the Dante SOCKS proxy server to accelerate a 
high-latency (approximately 1-second round-trip) network link.  We need to 
support many concurrent transfers of large files.  To do this, we have set 
the machine up with the following parameters.




Compiled Dante with the following setting in include/config.h

SOCKD_BUFSIZETCP = (1024*1000)



/etc/sysctl.conf :

kern.ipc.maxsockbuf=4194304

net.inet.tcp.sendspace=2097152

net.inet.tcp.recvspace=2097152



/boot/loader.conf :

kern.ipc.maxsockets="0" (also tried 25600, 51200, 102400, and 409600)
kern.ipc.nmbclusters="0" (also tried 102400 and 409600)

(Looking at the code, it seems that 0 means not to set a max for the above 
two controls.)




If kern.ipc.nmbclusters is set to 25600, the system will hard freeze when 
"vmstat -z" shows the number of clusters reaches 25600.  If 
kern.ipc.nmbclusters is set to 0 (or 102400), the system will hard freeze 
when "vmstat -z" shows the number of clusters is around 66000.  When it 
freezes, the number of Kbytes allocated to network (as shown by 
"netstat -m") is roughly 160,000 (160MB).




For a while, we thought that there may be a limit of 65536 mbuf clusters, so 
we tested building the kernel with MCLSHIFT=12, which makes each mbcluster 
4096-bytes.  With this configuration, nmbclusters only reached about 33000 
before the system froze.  The number of Kbytes allocated to network (as 
shown by "netstat -m") still maxed out at around 160,000.




Now, it seems that we are running into some other memory limitation that 
occurs when our network allocation gets close to 160MB.  We have tried 
tuning paramaters such as KVA_PAGES, vm.kmem_size, vm.kmem_size_max, etc. 
Though, we are unsure if the mods we made there helped in any way.




This is all being done on Celeron 2.8GHz machines with 3+ GB of RAM running 
FreeBSD 5.3.  We are very much tied to this platform at the moment, and 
upgrading is not a realistic option for us.  We would like to tune the 
systems to not lockup.  We can currently work around the problem (by using 
smaller buffers and such), but it is at the expense of network throughput, 
which is less than ideal.




Are there any other parameters that would help us to allocate more memory to 
the kernel networking?  What other options should we look into?




Thanks,

Ed Mandy

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Merging rc.d/network_ipv6 into rc.d/netif

2007-11-10 Thread Doug Barton

On Mon, 5 Nov 2007, Bob Johnson wrote:


On 11/5/07, Mike Makonnen <[EMAIL PROTECTED]> wrote:


Most IP related knobs will have an ipv4_ and ipv6_ version. To make the
transition easier rc.subr(8) will "automagically" DTRT for the following
knobs:
gateway_enable => ipv4_gateway_enable
router_enable  => ipv4_router_enable
router => ipv4_router
router_flags   => ipv4_router_flags
defaultrouter  => ipv4_defaultrouter
static_routes  => ipv4_static_routes
static_routes_ => ipv4_static_routes_
route_=> ipv4_route_
dhclient_program   => ipv4_dhclient_program
dhclient_flags => ipv4_dhclient_flags
dhclient_flags_ => ipv4_dhclient_flags_
background_dhclient_ => ipv4_background_dhclient_

Please try it and let me know what you think.


Personally, I'd prefer the new names be along the lines of
ifconfig__ipv4, ifconfig__ipv6,
defaultrouter_ipv4, defaultrouter_ipv6, dhclient_program_ipv4,
dhclient_program_ipv6, etc.


Personally I think that grouping things by ipv4/ipv6 makes more sense, and 
has better longevity.



And this would be a good time to change defaultrouter to default_router!


Or we could make it shorter and call it gateway.

Doug

--

This .signature sanitized for your protection

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"