On Apr 20, 2008, at 2:43 AM, Robert Watson wrote:
On Fri, 18 Apr 2008, Chris Pratt wrote:
Doesn't 7.0 fix this? I'd like to see an official definitive
answer and all I've been going on is that the problem description
is no longer in the errata.
Unfortunately, bugs of this sort don't really "work" that way --
specific bugs are a property of a problem in code (or a problem in
design), but what we have right now is a report of a symptom that
might reflect zero or more specific bugs. It's unclear that the
problem described in errata is the problem you've been
experiencing, or that the (at least one) fixed bug with the same
symptoms is that one you've been experiencing. For better or
worse, the only way to really tell of a generic class of hang or
wedging is fixed is to try out the new version and see. In most
cases, "zonelimit" wedging reflects one of two things:
(1) Inadequate resource allocation to the network stack or some other
component, try tuning up the memory tunable for clusters (for
example).
For several months I did quite a bit of tuning. I never increased
nmbclusters beyond the 32768 shown in the docs because man
tuning doesn't define it's use of "arbitrarily high". Inability to boot
could mean travel. Kris Kenneway had provided instructions to
get a dump. I set up for that but have never had a dump. The
only respite came from adding another circuit, another NIC and
spreading traffic. We increased our lock time from every couple
of days during the heavy bot period of late 2006 to now every
month or during traditionally slow months, even two months.
For example, we ran a record 72 days last summer. It was a
very dead summer traffic wise.
I will try to increase the nmbclusters dramatically if I can figure
out what a safe top limit is but it sounds like the jump to
7.0 RELEASE may be worth the effort. I would want to wait
until this issue with TCP, Windows and certain routers is well
past. I had not seen that applied to 7_0_0 yet and that would be
a show stopper. Is there a way to know what is safe for
nmbclusters given an 8GB ram system?
I did vmstats data collection for a couple of months when things
were at their worst. The results were nebulous to me based
on lack of code knowledge. All I actually found was that a
certain counter would drop to 0 and never recover. I didn't
know if it was meaningful and received no replies when I
asked FreeBSD-Questions. It was 128-Bucket or something
like that.
(2) A memory leak in a network device driver or other network part,
which
needs to be debugged and fixed.
Initially I thought there may be something related to the bge
driver and moved the high traffic apps on an em. This didn't
seem to help much, nor did polling.
I am most willing to collect data if I could figure out how to
collect something meaningful. I gather from what you say,
that 7.0 would provide this.
I really appreciate both of your responses. Just based on
this one problem, 6.x has been a bad experience after
years of seemingly impossible uptime on 4 and 5.x
FreeBSD.
On at least one prior occasion, there has been a bug in UMA itself
that lead to getting stuck in zonelimit, and it's not impossible
there's a scheduler sleep/wakeup bug that would lead to a similar
symptom but for a different reason.
In FreeBSD 7-STABLE, you can now use procstat -k to print kernel
stack traces of user threads blocked in kernel, which may make
diagnosing the general class of problem a bit easier without using
a kernel debugger. "zonelimit" is the generic wait channel across
all memory type and allocation paths, so doesn't reveal a lot about
*which* limit is being hit. Using a kernel stack trace, we can see
which specific memory type and allocation context is involved.
Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"