Re: Intermittent system hangs on 7.2-RELEASE-p1

John Baldwin Fri, 11 Sep 2009 08:19:56 -0700

On Thursday 10 September 2009 9:34:30 pm Linda Messerschmidt wrote:
> Just to follow up, I've been doing some testing with masking for
> KTR_LOCK rather than KTR_SCHED.
> 
> I'm having trouble with this because I have the KTR buffer size set to
> 1048576 entries, and with only KTR_LOCK enabled, this isn't enough for
> even a full second of tracing; the sample I'm working with now is just
> under 0.9s.  It's an average of one entry every 2001 TSC ticks.  That
> *seems* like a lot of locking activity, but some of the lock points
> are only a couple of lines apart, so maybe it's just incredibly
> verbose.
> 
> Since it's so much data and I'm still working on a way to correlate it
> (lockgraph.py?), all I've got so far is a list of what trace points
> are coming up the most:
> 
> 51927 src/sys/kern/kern_lock.c:215  (_lockmgr UNLOCK mtx_unlock() when
> flags & LK_INTERLOCK)
> 48033 src/sys/kern/vfs_subr.c:2284  (vdropl UNLOCK)
> 41548 src/sys/kern/vfs_subr.c:2187  (vput VI_LOCK)
> 29359 src/sys/kern/vfs_subr.c:2067  (vget VI_LOCK)
> 29358 src/sys/kern/vfs_subr.c:2079  (vget VI_UNLOCK)
> 23799 src/sys/nfsclient/nfs_subs.c:755  (nfs_getattrcache mtx_lock)
> 23460 src/sys/nfsclient/nfs_vnops.c:645  (nfs_getattr mtx_unlock)
> 23460 src/sys/nfsclient/nfs_vnops.c:642  (nfs_getattr mtx_lock)
> 23460 src/sys/nfsclient/nfs_subs.c:815  (nfs_getattrcache mtx_unlock)
> 23138 src/sys/kern/vfs_cache.c:345  (cache_lookup CACHE_LOCK)
> 
> Unfortunately, it kind of sounds like I'm on my way to answering "why
> is this system slow?" even though it really isn't slow.  (And I rush
> to point out that the Apache process in question doesn't at any point
> in its life touch NFS, though some of the other ones on the machine
> do.)
> 
> In order to be the cause of my Apache problem, all this goobering
> around with NFS would have to be relatively infrequent but so intense
> that it shoves everything else out of the way.  I'm skeptical, but I'm
> sure one of you guys can offer a more informed opinion.
> 
> The only other thing I can think of is maybe all this is running me
> out of something I need (vnodes?) so everybody else blocks until it
> finishes and lets go of whatever finite resource it's using up?  But
> that doesn't make a ton of sense either, because why would a lack of
> vnodes cause stalls in accept() or select() in unrelated processes?
> 
> Not sure if I'm going in the right direction here or not.


Try turning off KTR_LOCK for spin mutexes (just force LO_QUIET on in 
mtx_init() if MTX_SPIN is set) and use a schedgraph.py from the latest 
RELENG_7.  It knows how to parse KTR_LOCK events and drop event "bars" for 
locks showing when they are held.  A more recently schedgraph.py might also 
fix the bugs you were seeing with the idle threads looking too long (esp. at 
the start and end of graphs).

-- 
John Baldwin
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: Intermittent system hangs on 7.2-RELEASE-p1

Reply via email to