Re: System Freezes When MBufClust Usages Rises
On Sat, 10 Nov 2007, Ed Mandy wrote: If kern.ipc.nmbclusters is set to 25600, the system will hard freeze when "vmstat -z" shows the number of clusters reaches 25600. If kern.ipc.nmbclusters is set to 0 (or 102400), the system will hard freeze when "vmstat -z" shows the number of clusters is around 66000. When it freezes, the number of Kbytes allocated to network (as shown by "netstat -m") is roughly 160,000 (160MB). For a while, we thought that there may be a limit of 65536 mbuf clusters, so we tested building the kernel with MCLSHIFT=12, which makes each mbcluster 4096-bytes. With this configuration, nmbclusters only reached about 33000 before the system froze. The number of Kbytes allocated to network (as shown by "netstat -m") still maxed out at around 160,000. Now, it seems that we are running into some other memory limitation that occurs when our network allocation gets close to 160MB. We have tried tuning paramaters such as KVA_PAGES, vm.kmem_size, vm.kmem_size_max, etc. Though, we are unsure if the mods we made there helped in any way. This is all being done on Celeron 2.8GHz machines with 3+ GB of RAM running FreeBSD 5.3. We are very much tied to this platform at the moment, and upgrading is not a realistic option for us. We would like to tune the systems to not lockup. We can currently work around the problem (by using smaller buffers and such), but it is at the expense of network throughput, which is less than ideal. Are there any other parameters that would help us to allocate more memory to the kernel networking? What other options should we look into? I'd like to diagnose "freeze hard" a little more to understand what's going on. Hopefully this won't be too disruptive for your environment while you're doing it. First off, can you tell me how you're accessing the system to run diagnostic tools, monitor it, etc? Remember that if you run out of clusters, you may experience network deadlocks that prevent SSH sessions from operating (since there may be no memory for them to operate), so direct console access may be required to effectively monitor the system when in an extreme state of low memory in the network stack. Could you tell me if you are using a serial console or the video console? (Or firewire, I suppose?) FreeBSD 5.3 was the first release to include an MPSAFE network stack, and there were a number of optionally compiled features that could disable MPSAFE networking, resulting in the Giant lock being held over network operations. Could you tell me what the value of the sysctl debug.mpsafenet is? When the system appears to hard hang, does it recover if, say, left five minutes? What if you unplug the network cable and leave it five minutes? Does the numlock key on the console work? If you leave the console logged in and running an application (such as "sleep 10") and the system hangs, what do you see if you hit Ctrl-T? If you compile options BREAK_TO_DEBUGGER into the kernel and generate a serial break / hit ctrl-alt-esc, are you able to get into the debugger? If you type in "trace", what do you get? (There is a chapter of the developer's handbook that talks about using the kernel debugger, FYI). With 5.3, we found that usig a serial console to get to the debugger was a lot more reliable than the video console -- this is in part because a significant amount of the kernel (especially file systems and the video console) still run under Giant, so a thread hanging while holding Giant can prevent a console break from getting to the debugger. My advice would be to use a serial console anyway, if possible, when debugging, as it means you can use a second machine to copy and paste DDB output into a file to e-mail out later. After about the third line of a kernel stack trace, copying addresses out by hand becomes pretty painful :-). Unfortunately, I have to say that my first advice would be to upgrade -- not just because a lot of work has been done relating to network stack performance and stability since 5.3, but also because the debugging tools have gotten a lot better since then. For example, in more recent versions the kernel debugging includes memory monitoring tools, commands to more readily extract debugging information, etc. 5.3 is a solid and functional release, but when it comes to debugging problems of this sort, being on a more recent release means you're more likely to see the problem already fixed, and even if not, it will be easier for us to fix it. I understand that may simply not be possible, but if you have that flexibility, it's good advice. A general comment on configuration: increasing the maximum memory allocated to the network stack can indeed increase your KVA usage significantly. You might well find that tuning KVA up is required to run with very high memory configurations for the network stack, so your intuitions about tuning that up aren't bad. However, when you r
Current problem reports assigned to freebsd-net@FreeBSD.org
Current FreeBSD problem reports Critical problems S Tracker Resp. Description o kern/115360 net[ipv6] IPv6 address and if_bridge don't play well toge 1 problem total. Serious problems S Tracker Resp. Description s kern/21998 net[socket] [patch] ident only for outgoing connections a kern/38554 netchanging interface ipaddress doesn't seem to work s kern/39937 netipstealth issue s kern/81147 net[net] [patch] em0 reinitialization while adding aliase o kern/92552 netA serious bug in most network drivers from 5.X to 6.X s kern/95665 net[if_tun] "ping: sendto: No buffer space available" wit s kern/105943 netNetwork stack may modify read-only mbuf chain copies o kern/106316 net[dummynet] dummynet with multipass ipfw drops packets o kern/108542 net[bce]: Huge network latencies with 6.2-RELEASE / STABL o kern/109406 net[ndis] Broadcom WLAN driver 4.100.15.5 doesn't work wi o kern/110959 net[ipsec] Filtering incoming packets with enc0 does not o kern/112528 net[nfs] NFS over TCP under load hangs with "impossible p o kern/112686 net[patm] patm driver freezes System (FreeBSD 6.2-p4) i38 o kern/112722 netIP v4 udp fragmented packet reject o kern/113457 net[ipv6] deadlock occurs if a tunnel goes down while the o kern/113842 net[ipv6] PF_INET6 proto domain state can't be cleared wi o kern/114714 net[gre][patch] gre(4) is not MPSAFE and does not support o kern/114839 net[fxp] fxp looses ability to speak with traffic o kern/115239 net[ipnat] panic with 'kmem_map too small' using ipnat o kern/116077 net6.2-STABLE panic during use of multi-cast networking c o kern/116172 netNetwork / ipv6 recursive mutex panic o kern/116185 netif_iwi driver leads system to reboot o kern/116186 netcan not set wi channel on current o kern/116328 net[bge]: Solid hang with bge interface o kern/116747 net[ndis] FreeBSD 7.0-CURRENT crash with Dell TrueMobile o kern/116837 netifconfig tunX destroy: panic o kern/117271 net[tap] OpenVPN TAP uses 99% CPU on releng_6 when if_tap o kern/117293 net[carp] CARP interfaces causes packet loss o kern/117423 netDuplicate IP on different interfaces o bin/117448 net[carp] 6.2 kernel crash 30 problems total. Non-critical problems S Tracker Resp. Description o conf/23063 net[PATCH] for static ARP tables in rc.network s bin/41647netifconfig(8) doesn't accept lladdr along with inet addr o kern/54383 net[nfs] [patch] NFS root configurations without dynamic s kern/60293 netFreeBSD arp poison patch o kern/95267 netpacket drops periodically appear f kern/95277 net[netinet] [patch] IP Encapsulation mask_match() return o kern/100519 net[netisr] suggestion to fix suboptimal network polling o kern/102035 net[plip] plip networking disables parallel port printing o conf/102502 net[patch] ifconfig name does't rename netgraph node in n o kern/103253 netinconsistent behaviour in arp reply of a bridge o conf/107035 net[patch] bridge interface given in rc.conf not taking a o kern/112654 net[pcn] Kernel panic upon if_pcn module load on a Netfin o kern/114095 net[carp] carp+pf delay with high state limit o kern/114915 net[patch] [pcn] pcn (sys/pci/if_pcn.c) ethernet driver f o bin/116643 net[patch] fstat(1): add INET/INET6 socket details as in o bin/117339 net[patch] route(8): loading routing management commands o kern/117456 net[ipv6] ipv6 neighbour discovery / bce multicast probl 17 problems total. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: pf misfeature
On Fri, Nov 09, 2007 at 12:59:46AM +0100, Max Laier wrote: > Daniel, do you spot anything strange with these skip steps (or otherwise)? The problem is the lack of IP reassembly in this configuration. In pf_test_fragment(), a rule with r->flagset ("flags S/SA") is skipped. Generally, stateful filtering _requires_ IP reassembly. As long as no fragmentation occurs, it works even without reassembly. I suspect your UDP NFS traffic is fragmented. Try adding scrub in on $if all fragment reassemble at the top. Daniel ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: pf misfeature
On Monday 12 November 2007, Daniel Hartmeier wrote: > On Fri, Nov 09, 2007 at 12:59:46AM +0100, Max Laier wrote: > > Daniel, do you spot anything strange with these skip steps (or > > otherwise)? > > The problem is the lack of IP reassembly in this configuration. > > In pf_test_fragment(), a rule with r->flagset ("flags S/SA") is > skipped. Ah, I missed that one. Wouldn't it make sense to conditionalize these tests on the protocol? The attached can probably be optimized, but you get the general idea. It seems wrong that an explicit udp-rule behaves differently than an implied one. > Generally, stateful filtering _requires_ IP reassembly. As long as no > fragmentation occurs, it works even without reassembly. I suspect your > UDP NFS traffic is fragmented. > > Try adding > > scrub in on $if all fragment reassemble > > at the top. -- /"\ Best regards, | [EMAIL PROTECTED] \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | [EMAIL PROTECTED] / \ ASCII Ribbon Campaign | Against HTML Mail and News Index: pf.c === RCS file: /home/ncvs/src/sys/contrib/pf/net/pf.c,v retrieving revision 1.50 diff -u -r1.50 pf.c --- pf.c 28 Oct 2007 17:12:46 - 1.50 +++ pf.c 13 Nov 2007 02:58:31 - @@ -4560,9 +4560,17 @@ r = r->skip[PF_SKIP_DST_ADDR].ptr; else if (r->tos && !(r->tos == pd->tos)) r = TAILQ_NEXT(r, entries); - else if (r->src.port_op || r->dst.port_op || - r->flagset || r->type || r->code || - r->os_fingerprint != PF_OSFP_ANY) + else if (r->os_fingerprint != PF_OSFP_ANY) + r = TAILQ_NEXT(r, entries); + else if (pd->proto == IPPROTO_UDP && + (r->src.port_op || r->dst.port_op)) + r = TAILQ_NEXT(r, entries); + else if (pd->proto == IPPROTO_TCP && + (r->src.port_op || r->dst.port_op || r->flagset)) + r = TAILQ_NEXT(r, entries); + else if ((pd->proto == IPPROTO_ICMP || + pd->proto == IPPROTO_ICMPV6) && + (r->type || r->code)) r = TAILQ_NEXT(r, entries); else if (r->prob && r->prob <= arc4random()) r = TAILQ_NEXT(r, entries); signature.asc Description: This is a digitally signed message part.