Linda Messerschmidt wrote:
OK, I have learned that ktrdump looks up the name of the process
associated with a particular KSE at the the time of the dump, so if
it's changed since tracing stopped, it will blissfully blame the wrong
process.  I understand why that's the case, but it still sucks for
troubleshooting. :(

This time, "pf task mtx" and "vnode_free_list" are the locks getting
the blame.  The processes fingered are an httpd ( (the root "parent"
of the one doing the work, which does nothing but select() for 1s and
wait to see if its children died), and vnlru.  No correlation at all
to the previous results, and this machine is now utterly quiescent
except for the httpd process and the PHP exerciser.  Hard to imagine
vnlru has 1s worth of running to do on a machine with 949 total vnodes
in use.

A third run produced a 997ms "lock acquire" for "buffer daemon lock,"
a 497ms one for ip6qlock (no, there's no IPv6 in use on this machine),
and an 8s (!!!) one on unp_mtx. bufdaemon had a 997s "running" bar,
but according to the raw TSC values, that happened on the same CPU
1.999s *after* the 997ms buffer daemon lock acquire.

I really don't know where to go from here.  There's so little
consistency that I'm just not sure if the data is bad, the tool is
bad, the operator is bad, or there's some problem so fundamentally
horrible that all I'm seeing is random side effects.
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

does the system have a serial console? how about a normal console /keyboard?

how often deos it hang? and for  how long?
is there a chance that you could notice when it is hung and hit <CTL><LAT><ESC> and drop it into the debugger IN teh hung state?

It is possible if you have a serial port to make a program that sends a char back and forth and when the machine hangs, sends teh magic sequence. (I think it's CR<tilde><CTL-D> for serial debugger break,
but I'm sure you can look up the kernel options and the chars in google.)

if you can drop the machine into DDB (teh kernel debugger) in teh
hung state, then there are lots of comands you can do to find out
what is wrong. jhb actually gave a short talk that I videod and put
on youtube on the topic.

ps will show you what is actually running on which CPU and you an see what locks all the other processes are waiting on.
then you can examine those locks and see who owns them.

_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Reply via email to