Re: System Freezes When MBufClust Usages Rises

Robert Watson Mon, 12 Nov 2007 00:43:03 -0800

On Sat, 10 Nov 2007, Ed Mandy wrote:

If kern.ipc.nmbclusters is set to 25600, the system will hard freeze when"vmstat -z" shows the number of clusters reaches 25600. Ifkern.ipc.nmbclusters is set to 0 (or 102400), the system will hard freezewhen "vmstat -z" shows the number of clusters is around 66000. When itfreezes, the number of Kbytes allocated to network (as shown by "netstat-m") is roughly 160,000 (160MB).
For a while, we thought that there may be a limit of 65536 mbuf clusters, sowe tested building the kernel with MCLSHIFT=12, which makes each mbcluster4096-bytes. With this configuration, nmbclusters only reached about 33000before the system froze. The number of Kbytes allocated to network (asshown by "netstat -m") still maxed out at around 160,000.
Now, it seems that we are running into some other memory limitation thatoccurs when our network allocation gets close to 160MB. We have triedtuning paramaters such as KVA_PAGES, vm.kmem_size, vm.kmem_size_max, etc.Though, we are unsure if the mods we made there helped in any way.
This is all being done on Celeron 2.8GHz machines with 3+ GB of RAM runningFreeBSD 5.3. We are very much tied to this platform at the moment, andupgrading is not a realistic option for us. We would like to tune thesystems to not lockup. We can currently work around the problem (by usingsmaller buffers and such), but it is at the expense of network throughput,which is less than ideal.
Are there any other parameters that would help us to allocate more memory tothe kernel networking? What other options should we look into?

I'd like to diagnose "freeze hard" a little more to understand what's goingon. Hopefully this won't be too disruptive for your environment while you'redoing it.

First off, can you tell me how you're accessing the system to run diagnostictools, monitor it, etc? Remember that if you run out of clusters, you mayexperience network deadlocks that prevent SSH sessions from operating (sincethere may be no memory for them to operate), so direct console access may berequired to effectively monitor the system when in an extreme state of lowmemory in the network stack. Could you tell me if you are using a serialconsole or the video console? (Or firewire, I suppose?)

FreeBSD 5.3 was the first release to include an MPSAFE network stack, andthere were a number of optionally compiled features that could disable MPSAFEnetworking, resulting in the Giant lock being held over network operations.Could you tell me what the value of the sysctl debug.mpsafenet is?

When the system appears to hard hang, does it recover if, say, left fiveminutes? What if you unplug the network cable and leave it five minutes?

Does the numlock key on the console work? If you leave the console logged inand running an application (such as "sleep 100000") and the system hangs, whatdo you see if you hit Ctrl-T?

If you compile options BREAK_TO_DEBUGGER into the kernel and generate a serialbreak / hit ctrl-alt-esc, are you able to get into the debugger? If you typein "trace", what do you get? (There is a chapter of the developer's handbookthat talks about using the kernel debugger, FYI). With 5.3, we found thatusig a serial console to get to the debugger was a lot more reliable than thevideo console -- this is in part because a significant amount of the kernel(especially file systems and the video console) still run under Giant, so athread hanging while holding Giant can prevent a console break from getting tothe debugger. My advice would be to use a serial console anyway, if possible,when debugging, as it means you can use a second machine to copy and paste DDBoutput into a file to e-mail out later. After about the third line of akernel stack trace, copying addresses out by hand becomes pretty painful :-).

Unfortunately, I have to say that my first advice would be to upgrade -- notjust because a lot of work has been done relating to network stack performanceand stability since 5.3, but also because the debugging tools have gotten alot better since then. For example, in more recent versions the kerneldebugging includes memory monitoring tools, commands to more readily extractdebugging information, etc. 5.3 is a solid and functional release, but whenit comes to debugging problems of this sort, being on a more recent releasemeans you're more likely to see the problem already fixed, and even if not, itwill be easier for us to fix it. I understand that may simply not bepossible, but if you have that flexibility, it's good advice.

A general comment on configuration: increasing the maximum memory allocated tothe network stack can indeed increase your KVA usage significantly. You mightwell find that tuning KVA up is required to run with very high memoryconfigurations for the network stack, so your intuitions about tuning that uparen't bad. However, when you run out of KVA, the result is usually a panic(since the kernel basically has to halt), so if you're not seeing a panic thenyou're probably not yet hitting the limit.


Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: System Freezes When MBufClust Usages Rises

Reply via email to