OK, I have learned that ktrdump looks up the name of the process associated with a particular KSE at the the time of the dump, so if it's changed since tracing stopped, it will blissfully blame the wrong process. I understand why that's the case, but it still sucks for troubleshooting. :(
This time, "pf task mtx" and "vnode_free_list" are the locks getting the blame. The processes fingered are an httpd ( (the root "parent" of the one doing the work, which does nothing but select() for 1s and wait to see if its children died), and vnlru. No correlation at all to the previous results, and this machine is now utterly quiescent except for the httpd process and the PHP exerciser. Hard to imagine vnlru has 1s worth of running to do on a machine with 949 total vnodes in use. A third run produced a 997ms "lock acquire" for "buffer daemon lock," a 497ms one for ip6qlock (no, there's no IPv6 in use on this machine), and an 8s (!!!) one on unp_mtx. bufdaemon had a 997s "running" bar, but according to the raw TSC values, that happened on the same CPU 1.999s *after* the 997ms buffer daemon lock acquire. I really don't know where to go from here. There's so little consistency that I'm just not sure if the data is bad, the tool is bad, the operator is bad, or there's some problem so fundamentally horrible that all I'm seeing is random side effects. _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"