Greetings,
Thanks to everybody for their quick responces before. (I've also had
another crack at my TMDA filter so hopefully my reply address will work this
time).
Last time I forgot to mention I was pulling the datafiles from a compaq raid
system (ciss0: <HP Smart Array 6i>). I had a large number of files with
random content, so there was lots of waiting for disk. I've now setup MFS
with not as many files. This seemed to bring back network stability. I
also adjusted the TCP windows (net.inet.tcp.sendspace=65536,
net.inet.tcp.recvspace=65536), but once on the MFS I found no change moving
to the bigger window sizes (net.inet.tcp.sendspace=1024000,
net.inet.tcp.recvspace=1024000).
I've found that the polling settings all seem to be for 100MB/s not Gig, so
I've edited /usr/src/sys/kern/kern_poll.c and increased the #define
statements by at least 10:
Before:
#define MIN_POLL_BURST_MAX 10
#define MAX_POLL_BURST_MAX 1000
After:
#define MIN_POLL_BURST_MAX 1000
#define MAX_POLL_BURST_MAX 10000
Then set /etc/sysctl.conf to
--------------------
kern.polling.burst=5000
kern.polling.each_burst=1000
kern.polling.burst_max=8000
--------------------
Performance improved lots, although I was still seeing the
"kern.polling.short_ticks" increasing rapidly. The
/usr/src/sys/kern/kern_poll.c mentions that this means the poll rate is to
high, so I dropped the HZ back to 10000 from 15000, and the problem has gone
away.
The server under siege is now stable with 60 concurrnet sessions, when
before it could not handle this. The processes also seem to be in "accept"
rather than "lockf".
--------------------
last pid: 3469; load averages: 1.79, 1.70, 1.47
up 0+00:28:09 05:59:46
191 processes: 8 running, 183 sleeping
CPU states: 2.0% user, 0.0% nice, 32.6% system, 48.0% interrupt, 17.4%
idle
Mem: 34M Active, 7180K Inact, 87M Wired, 29M Buf, 869M Free
Swap: 2023M Total, 2023M Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND
616 www 4 0 3420K 2152K sbwait 1 0:07 0.39% 0.39% httpd
3305 www 4 0 3432K 2160K accept 1 0:07 0.34% 0.34% httpd
690 www 4 0 3420K 2152K accept 1 0:06 0.34% 0.34% httpd
664 www 4 0 3436K 2172K accept 1 0:06 0.29% 0.29% httpd
633 www 4 0 3436K 2172K accept 1 0:06 0.29% 0.29% httpd
651 www 4 0 3436K 2172K RUN 1 0:06 0.24% 0.24% httpd
3390 www 4 0 3432K 2160K accept 0 0:05 0.24% 0.24% httpd
612 www 4 0 3436K 2172K accept 1 0:07 0.20% 0.20% httpd
631 www 4 0 3436K 2172K accept 1 0:07 0.20% 0.20% httpd
621 www 4 0 3436K 2172K accept 1 0:06 0.15% 0.15% httpd
697 www 4 0 3436K 2172K RUN 1 0:06 0.15% 0.15% httpd
3380 www 4 0 3432K 2160K sbwait 1 0:06 0.15% 0.15% httpd
3392 www 4 0 3432K 2160K accept 1 0:05 0.15% 0.15% httpd
3397 www 4 0 3432K 2160K RUN 1 0:05 0.15% 0.15% httpd
3376 www 4 0 3432K 2160K accept 1 0:05 0.15% 0.15% httpd
3383 www 4 0 3432K 2160K accept 1 0:05 0.15% 0.15% httpd
3315 www 4 0 3432K 2160K accept 0 0:07 0.10% 0.10% httpd
3309 www 4 0 3432K 2160K sbwait 1 0:07 0.10% 0.10% httpd
--------------------
This is another server under siege the same configuration, but without the
POLL_BURST_MAX tweaks and HZ=15000.
--------------------
last pid: 24068; load averages: 13.54, 5.40, 4.63
up 0+02:59:04 17:19:11
233 processes: 4 running, 228 sleeping, 1 zombie
CPU states: 3.8% user, 0.0% nice, 31.8% system, 47.3% interrupt, 17.0%
idle
Mem: 46M Active, 8396K Inact, 105M Wired, 48K Cache, 33M Buf, 838M Free
Swap: 2023M Total, 2023M Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND
4508 www 4 0 5040K 3256K sbwait 1 0:37 0.54% 0.54% httpd
4497 www 4 0 5040K 3256K sbwait 1 0:34 0.34% 0.34% httpd
4539 www 4 0 5040K 3256K sbwait 1 0:36 0.29% 0.29% httpd
4521 www 20 0 5040K 3256K lockf 1 0:34 0.29% 0.29% httpd
626 www 4 0 5040K 3252K sbwait 1 0:36 0.24% 0.24% httpd
4896 www 20 0 5040K 3256K lockf 1 0:35 0.24% 0.24% httpd
4522 www 4 0 5040K 3256K sbwait 0 0:34 0.24% 0.24% httpd
629 www 20 0 5040K 3252K lockf 1 0:35 0.20% 0.20% httpd
601 www 4 0 5040K 3252K sbwait 1 0:33 0.20% 0.20% httpd
600 www 20 0 5040K 3252K lockf 1 0:35 0.15% 0.15% httpd
674 www 20 0 5040K 3252K lockf 1 0:34 0.15% 0.15% httpd
4787 www 4 0 5040K 3256K sbwait 1 0:34 0.15% 0.15% httpd
669 www 20 0 5040K 3252K lockf 1 0:34 0.15% 0.15% httpd
4509 www 20 0 5040K 3256K lockf 1 0:32 0.15% 0.15% httpd
4486 www 20 0 5040K 3256K lockf 1 0:36 0.10% 0.10% httpd
4906 www 20 0 5040K 3256K lockf 1 0:36 0.10% 0.10% httpd
4542 www 20 0 5040K 3256K lockf 1 0:36 0.10% 0.10% httpd
607 www 4 0 5040K 3252K sbwait 1 0:35 0.10% 0.10% httpd
4510 www 4 0 5040K 3272K sbwait 1 0:35 0.10% 0.10% httpd
--------------------
On both system the kern.polling.lost_polls is still increasing rapidly. I'm
not sure what to do about this. ??
--------------------
kern.polling.lost_polls: 9605569
--------------------
Also the kern.polling.suspect is increasing similarly. I'm not sure what to
do about this either. ??
------------------
kern.polling.suspect: 608527
------------------
Also thanks for the info on the VLAN searching. I think the adjustment you
suggested sounds good, but at bit out of my league. It seems there are
plent of things to tweak in the kernel still.
BTW, I'd be interested to know people's thoughts on multiple IP stacks on
FreeBSD. It would be really cool to be able to give a jail it's own IP
stack bound to a VLAN interface. It could then be like a VRF on Cisco.
Regards,
Dave Seddon
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"