On Thursday 10 September 2009 9:34:30 pm Linda Messerschmidt wrote: > Just to follow up, I've been doing some testing with masking for > KTR_LOCK rather than KTR_SCHED. > > I'm having trouble with this because I have the KTR buffer size set to > 1048576 entries, and with only KTR_LOCK enabled, this isn't enough for > even a full second of tracing; the sample I'm working with now is just > under 0.9s. It's an average of one entry every 2001 TSC ticks. That > *seems* like a lot of locking activity, but some of the lock points > are only a couple of lines apart, so maybe it's just incredibly > verbose. > > Since it's so much data and I'm still working on a way to correlate it > (lockgraph.py?), all I've got so far is a list of what trace points > are coming up the most: > > 51927 src/sys/kern/kern_lock.c:215 (_lockmgr UNLOCK mtx_unlock() when > flags & LK_INTERLOCK) > 48033 src/sys/kern/vfs_subr.c:2284 (vdropl UNLOCK) > 41548 src/sys/kern/vfs_subr.c:2187 (vput VI_LOCK) > 29359 src/sys/kern/vfs_subr.c:2067 (vget VI_LOCK) > 29358 src/sys/kern/vfs_subr.c:2079 (vget VI_UNLOCK) > 23799 src/sys/nfsclient/nfs_subs.c:755 (nfs_getattrcache mtx_lock) > 23460 src/sys/nfsclient/nfs_vnops.c:645 (nfs_getattr mtx_unlock) > 23460 src/sys/nfsclient/nfs_vnops.c:642 (nfs_getattr mtx_lock) > 23460 src/sys/nfsclient/nfs_subs.c:815 (nfs_getattrcache mtx_unlock) > 23138 src/sys/kern/vfs_cache.c:345 (cache_lookup CACHE_LOCK) > > Unfortunately, it kind of sounds like I'm on my way to answering "why > is this system slow?" even though it really isn't slow. (And I rush > to point out that the Apache process in question doesn't at any point > in its life touch NFS, though some of the other ones on the machine > do.) > > In order to be the cause of my Apache problem, all this goobering > around with NFS would have to be relatively infrequent but so intense > that it shoves everything else out of the way. I'm skeptical, but I'm > sure one of you guys can offer a more informed opinion. > > The only other thing I can think of is maybe all this is running me > out of something I need (vnodes?) so everybody else blocks until it > finishes and lets go of whatever finite resource it's using up? But > that doesn't make a ton of sense either, because why would a lack of > vnodes cause stalls in accept() or select() in unrelated processes? > > Not sure if I'm going in the right direction here or not.
Try turning off KTR_LOCK for spin mutexes (just force LO_QUIET on in mtx_init() if MTX_SPIN is set) and use a schedgraph.py from the latest RELENG_7. It knows how to parse KTR_LOCK events and drop event "bars" for locks showing when they are held. A more recently schedgraph.py might also fix the bugs you were seeing with the idle threads looking too long (esp. at the start and end of graphs). -- John Baldwin _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"