On Mon, 19 Sep 2005, Koen Martens wrote:
Without the debug stuff in the kernel, it crashed within 2 days, same
story: postgresql process, function propagate_priority. However, no dump
was written to disk :(
Furthermore, i've been seeing the same crash (in propagate_priority) on
another box in mysql processes. Both servers seem to panic every 2-3
days. I have another server of the exact same hardware configuration,
but it is mainly idling most of the time. Haven't seen that one crash
yet.
I am thinking now that it is a bug in the twa driver, so i'll have to
dig in to that. Furthermore, it seems to have to do with some sort of
concurrency issue or otherwise timing-sensitive issue, because slowing
the kernel down with debug code seems to avoid the panic. But, as i am
completely new to the freebsd kernel and don't even know what turnstiles
are, i imagine i will have a hard time. So if anyone can offer some
help, please :)
Ok, thanks for your attention,
I can't speak to the problem with the core dumps, as it sounds like that
is device/firmware related. However, I probably can lend a hand in
debugging the problems you're seeing.
First off, propagate_priority() is part of the priority propagation
mechanism associated with mutexes, which are a locking primitive in the
FreeBSD kernel. Most panic in propagate_priority() are actually the
result of a corrupted mutex, and when the mutex code goes to perform
priority propagation, it trips over bad pointers and panics in some form
of another. Often, this means the actual panic or failure has not
occurred in the thread that prints out the panic you see, but another
panic. So the first task on hitting a propagate_priority() panic is to
identify the thread that actually had the problem.
Usually, I do this from DDB, rather than a core dump, because I find that
DDB's tools for inspect running state are a little easier to use. First,
I identify what code called into the mutex call that resulted in
propagate_priority() being called. The reason to do this is that what you
want to do next is use "ps" and "trace" to identify other
processes/threads in the same code, and hence likely to have caused a
problem with the mutex storage in memory. Generally, you're looking for a
panic in another thread, so once you identify a set of threads that might
be to blame, you can trace them to find one that is in panic(). Usually,
that thread will be in the RUN state, or on an SMP box, possibly running
on another CPU. If you're running 6.x, the thread that panicked was
likely preempted as it had problems, perhaps due to an untimely interrupt.
If you want to do this by e-mail so we can lend a hand, you probably want
to hook up a serial console so you can copy and paste the debugging
session. Compile DDB into the kernel (this should have no performance
overhead), and when the system panics, you'll (ideally) get a db> prompt.
The panic message and any related context (such as trap information) is
useful. I usually then use "show percpu" to see what CPU I'm running on,
the thread that's running, etc. I'll then use "trace" with no argument to
see the stack of the thread. If I'm trying to find another thread that
may have been preempted, I'll use "ps" to show the running processes and
threads, then "trace <pid>" to trace the main thread of processes that
look interesting. Generally, those in the RUN state, because the thread
will be runnable.
If you're running on an SMP system, you may occasionally find that
information to inspect the stacks of threads currently running on other
processors may not be consistently in memory -- i.e., cached, the stack
frame is partially written, or whatever. There's a kernel option,
KDB_STOP_NMI, which when combined with a sysctl, will cause the debugger
to deliver an NMI IPI instead of a debug IPI, which may help kick those
processors into the debugger if they are stuck in spin locks. However,
the chances are fairly good this isn't the case so you're probably fine
without it.
Robert N M Watson
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"