On Thu, Aug 27, 2009 at 5:29 PM, John Baldwin<j...@freebsd.org> wrote: > Ah, cool, what you want to do is use KTR with KTR_SCHED and then use > schedgraph.py (src/tools/sched) to get a visual picture of what the box does > during a hang. The timestamps in KTR are TSC cycle counts rather than an > actual wall time which is why they look off. If you have a lot of events you > may want to use a larger KTR_ENTRIES size btw (I use 1048576 (2 ^ 20) here at > work to get large (multiple-second) traces).
I'm still working on this. I enabled KTR and set it up to log KTR_SCHED events. Then, I wrote a script to exercise the HTTP server that actually ran on that machine, and set it to issue "sysctl debug.ktr.cpumask=0" and abort if a request took over 2 seconds. 28,613 requests later, it tripped over one that took 2007ms. (Just a refresher: this is a static file being served by an Apache process that has nothing else to do but serve this file on a relatively unloaded machine.) I don't have access to any machines that can run X, so I did the best I could to examine it from the shell. First, this machine has two CPU's so I split up the KTR results on a per-CPU basis so I could look at each individually. With KTR_ENTRIES set to 1048576, I got about 53 seconds of data with just KTR_SCHED enabled. Since I was interested in a 2.007 second period of time right at the end, I hacked it down to the last 3.795 seconds. In the 3.795 seconds captured in the trace period on CPU 0 that includes the entire 2.007 second stall, CPU 0 was idle for 3.175 seconds. In the same period, CPU 1 was idle for 3.2589 seconds. I did the best I could to manually page through all the scheduling activity on both CPUs during that 3.7 second time, and I didn't see anything really disruptive. Mainly idle, with jumps into the clock and ethernet kernel threads, as well as httpd. If I understand that correctly and have done everything right, that means that whatever happened, it wasn't related to CPU contention or scheduling issues of any sort. So, a couple of follow-up questions: First, what else should I be looking at? I built the kernel with kind of a lot of KTR flags (KTR_LOCK|KTR_SCHED|KTR_PROC|KTR_INTR|KTR_CALLOUT|KTR_UMA|KTR_SYSC) but enabling them all produces enough output that even 1048576 entries doesn't always go back two seconds; the volume of data is all but unmanageable. Second, is there any way to correlate the process address reported by the KTR scheduler entries back to a PID? It'd be nice to be able to view the scheduler activity just for the process I'm interested in, but I can't figure out which one it is. :) Thanks! _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"