Re: System Freezes When MBufClust Usages Rises

2007-11-12 Thread Robert Watson

On Sat, 10 Nov 2007, Ed Mandy wrote:

If kern.ipc.nmbclusters is set to 25600, the system will hard freeze when 
"vmstat -z" shows the number of clusters reaches 25600.  If 
kern.ipc.nmbclusters is set to 0 (or 102400), the system will hard freeze 
when "vmstat -z" shows the number of clusters is around 66000.  When it 
freezes, the number of Kbytes allocated to network (as shown by "netstat 
-m") is roughly 160,000 (160MB).


For a while, we thought that there may be a limit of 65536 mbuf clusters, so 
we tested building the kernel with MCLSHIFT=12, which makes each mbcluster 
4096-bytes.  With this configuration, nmbclusters only reached about 33000 
before the system froze.  The number of Kbytes allocated to network (as 
shown by "netstat -m") still maxed out at around 160,000.


Now, it seems that we are running into some other memory limitation that 
occurs when our network allocation gets close to 160MB.  We have tried 
tuning paramaters such as KVA_PAGES, vm.kmem_size, vm.kmem_size_max, etc. 
Though, we are unsure if the mods we made there helped in any way.


This is all being done on Celeron 2.8GHz machines with 3+ GB of RAM running 
FreeBSD 5.3.  We are very much tied to this platform at the moment, and 
upgrading is not a realistic option for us.  We would like to tune the 
systems to not lockup.  We can currently work around the problem (by using 
smaller buffers and such), but it is at the expense of network throughput, 
which is less than ideal.


Are there any other parameters that would help us to allocate more memory to 
the kernel networking?  What other options should we look into?


I'd like to diagnose "freeze hard" a little more to understand what's going 
on.  Hopefully this won't be too disruptive for your environment while you're 
doing it.


First off, can you tell me how you're accessing the system to run diagnostic 
tools, monitor it, etc?  Remember that if you run out of clusters, you may 
experience network deadlocks that prevent SSH sessions from operating (since 
there may be no memory for them to operate), so direct console access may be 
required to effectively monitor the system when in an extreme state of low 
memory in the network stack.  Could you tell me if you are using a serial 
console or the video console?  (Or firewire, I suppose?)


FreeBSD 5.3 was the first release to include an MPSAFE network stack, and 
there were a number of optionally compiled features that could disable MPSAFE 
networking, resulting in the Giant lock being held over network operations. 
Could you tell me what the value of the sysctl debug.mpsafenet is?


When the system appears to hard hang, does it recover if, say, left five 
minutes?  What if you unplug the network cable and leave it five minutes?


Does the numlock key on the console work?  If you leave the console logged in 
and running an application (such as "sleep 10") and the system hangs, what 
do you see if you hit Ctrl-T?


If you compile options BREAK_TO_DEBUGGER into the kernel and generate a serial 
break / hit ctrl-alt-esc, are you able to get into the debugger?  If you type 
in "trace", what do you get?  (There is a chapter of the developer's handbook 
that talks about using the kernel debugger, FYI).  With 5.3, we found that 
usig a serial console to get to the debugger was a lot more reliable than the 
video console -- this is in part because a significant amount of the kernel 
(especially file systems and the video console) still run under Giant, so a 
thread hanging while holding Giant can prevent a console break from getting to 
the debugger.  My advice would be to use a serial console anyway, if possible, 
when debugging, as it means you can use a second machine to copy and paste DDB 
output into a file to e-mail out later.  After about the third line of a 
kernel stack trace, copying addresses out by hand becomes pretty painful :-).


Unfortunately, I have to say that my first advice would be to upgrade -- not 
just because a lot of work has been done relating to network stack performance 
and stability since 5.3, but also because the debugging tools have gotten a 
lot better since then.  For example, in more recent versions the kernel 
debugging includes memory monitoring tools, commands to more readily extract 
debugging information, etc.  5.3 is a solid and functional release, but when 
it comes to debugging problems of this sort, being on a more recent release 
means you're more likely to see the problem already fixed, and even if not, it 
will be easier for us to fix it.  I understand that may simply not be 
possible, but if you have that flexibility, it's good advice.


A general comment on configuration: increasing the maximum memory allocated to 
the network stack can indeed increase your KVA usage significantly.  You might 
well find that tuning KVA up is required to run with very high memory 
configurations for the network stack, so your intuitions about tuning that up 
aren't bad.  However, when you r

Current problem reports assigned to freebsd-net@FreeBSD.org

2007-11-12 Thread FreeBSD bugmaster
Current FreeBSD problem reports
Critical problems

S Tracker  Resp.  Description

o kern/115360  net[ipv6] IPv6 address and if_bridge don't play well toge

1 problem total.

Serious problems

S Tracker  Resp.  Description

s kern/21998   net[socket] [patch] ident only for outgoing connections
a kern/38554   netchanging interface ipaddress doesn't seem to work
s kern/39937   netipstealth issue
s kern/81147   net[net] [patch] em0 reinitialization while adding aliase
o kern/92552   netA serious bug in most network drivers from 5.X to 6.X 
s kern/95665   net[if_tun] "ping: sendto: No buffer space available" wit
s kern/105943  netNetwork stack may modify read-only mbuf chain copies
o kern/106316  net[dummynet] dummynet with multipass ipfw drops packets 
o kern/108542  net[bce]: Huge network latencies with 6.2-RELEASE / STABL
o kern/109406  net[ndis] Broadcom WLAN driver 4.100.15.5 doesn't work wi
o kern/110959  net[ipsec] Filtering incoming packets with enc0 does not 
o kern/112528  net[nfs] NFS over TCP under load hangs with "impossible p
o kern/112686  net[patm] patm driver freezes System (FreeBSD 6.2-p4) i38
o kern/112722  netIP v4 udp fragmented packet reject
o kern/113457  net[ipv6] deadlock occurs if a tunnel goes down while the
o kern/113842  net[ipv6] PF_INET6 proto domain state can't be cleared wi
o kern/114714  net[gre][patch] gre(4) is not MPSAFE and does not support
o kern/114839  net[fxp] fxp looses ability to speak with traffic
o kern/115239  net[ipnat] panic with 'kmem_map too small' using ipnat
o kern/116077  net6.2-STABLE panic during use of multi-cast networking c
o kern/116172  netNetwork / ipv6 recursive mutex panic
o kern/116185  netif_iwi driver leads system to reboot
o kern/116186  netcan not set wi channel on current
o kern/116328  net[bge]: Solid hang with bge interface
o kern/116747  net[ndis] FreeBSD 7.0-CURRENT crash with Dell TrueMobile 
o kern/116837  netifconfig tunX destroy: panic
o kern/117271  net[tap] OpenVPN TAP uses 99% CPU on releng_6 when if_tap
o kern/117293  net[carp] CARP interfaces causes packet loss
o kern/117423  netDuplicate IP on different interfaces
o bin/117448   net[carp] 6.2 kernel crash

30 problems total.

Non-critical problems

S Tracker  Resp.  Description

o conf/23063   net[PATCH] for static ARP tables in rc.network
s bin/41647netifconfig(8) doesn't accept lladdr along with inet addr
o kern/54383   net[nfs] [patch] NFS root configurations without dynamic 
s kern/60293   netFreeBSD arp poison patch
o kern/95267   netpacket drops periodically appear
f kern/95277   net[netinet] [patch] IP Encapsulation mask_match() return
o kern/100519  net[netisr] suggestion to fix suboptimal network polling
o kern/102035  net[plip] plip networking disables parallel port printing
o conf/102502  net[patch] ifconfig name does't rename netgraph node in n
o kern/103253  netinconsistent behaviour in arp reply of a bridge
o conf/107035  net[patch] bridge interface given in rc.conf not taking a
o kern/112654  net[pcn] Kernel panic upon if_pcn module load on a Netfin
o kern/114095  net[carp] carp+pf delay with high state limit
o kern/114915  net[patch] [pcn] pcn (sys/pci/if_pcn.c) ethernet driver f
o bin/116643   net[patch] fstat(1): add INET/INET6 socket details as in 
o bin/117339   net[patch] route(8): loading routing management commands 
o kern/117456  net[ipv6] ipv6 neighbour discovery / bce multicast  probl

17 problems total.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: pf misfeature

2007-11-12 Thread Daniel Hartmeier
On Fri, Nov 09, 2007 at 12:59:46AM +0100, Max Laier wrote:

> Daniel, do you spot anything strange with these skip steps (or otherwise)?

The problem is the lack of IP reassembly in this configuration.

In pf_test_fragment(), a rule with r->flagset ("flags S/SA") is skipped.

Generally, stateful filtering _requires_ IP reassembly. As long as no
fragmentation occurs, it works even without reassembly. I suspect your
UDP NFS traffic is fragmented.

Try adding

  scrub in on $if all fragment reassemble

at the top.

Daniel
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: pf misfeature

2007-11-12 Thread Max Laier
On Monday 12 November 2007, Daniel Hartmeier wrote:
> On Fri, Nov 09, 2007 at 12:59:46AM +0100, Max Laier wrote:
> > Daniel, do you spot anything strange with these skip steps (or
> > otherwise)?
>
> The problem is the lack of IP reassembly in this configuration.
>
> In pf_test_fragment(), a rule with r->flagset ("flags S/SA") is
> skipped.

Ah, I missed that one.  Wouldn't it make sense to conditionalize these 
tests on the protocol?  The attached can probably be optimized, but you 
get the general idea.

It seems wrong that an explicit udp-rule behaves differently than an 
implied one.

> Generally, stateful filtering _requires_ IP reassembly. As long as no
> fragmentation occurs, it works even without reassembly. I suspect your
> UDP NFS traffic is fragmented.
>
> Try adding
>
>   scrub in on $if all fragment reassemble
>
> at the top.


-- 
/"\  Best regards,  | [EMAIL PROTECTED]
\ /  Max Laier  | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News
Index: pf.c
===
RCS file: /home/ncvs/src/sys/contrib/pf/net/pf.c,v
retrieving revision 1.50
diff -u -r1.50 pf.c
--- pf.c	28 Oct 2007 17:12:46 -	1.50
+++ pf.c	13 Nov 2007 02:58:31 -
@@ -4560,9 +4560,17 @@
 			r = r->skip[PF_SKIP_DST_ADDR].ptr;
 		else if (r->tos && !(r->tos == pd->tos))
 			r = TAILQ_NEXT(r, entries);
-		else if (r->src.port_op || r->dst.port_op ||
-		r->flagset || r->type || r->code ||
-		r->os_fingerprint != PF_OSFP_ANY)
+		else if (r->os_fingerprint != PF_OSFP_ANY)
+			r = TAILQ_NEXT(r, entries);
+		else if (pd->proto == IPPROTO_UDP &&
+		(r->src.port_op || r->dst.port_op))
+			r = TAILQ_NEXT(r, entries);
+		else if (pd->proto == IPPROTO_TCP &&
+		(r->src.port_op || r->dst.port_op || r->flagset))
+			r = TAILQ_NEXT(r, entries);
+		else if ((pd->proto == IPPROTO_ICMP ||
+		pd->proto == IPPROTO_ICMPV6) &&
+		(r->type || r->code))
 			r = TAILQ_NEXT(r, entries);
 		else if (r->prob && r->prob <= arc4random())
 			r = TAILQ_NEXT(r, entries);


signature.asc
Description: This is a digitally signed message part.