Another place to start might be with Brendan Gregg's DTrace tools: http://www.brendangregg.com/dtrace.html
His prustat, hotuser, hotkernel, and shortlived.d scripts might be helpful in your situation. -j On Mon, May 21, 2007 at 01:50:27PM -0700, Eric Saxe wrote: > Jeffrey Collyer wrote: > >5 identical V440s, Solaris 10, storage on a Netapp via NFS, providing > >access to a mailstore (so 80% read, 20% write). > > > >During the day, randomly a machine will start to climb its load from the > >baseline of 2-3 up to 50-60. Under heavy loading, I've seen it go up to > >300. All the time will be in split almost 50/50 user and kernel, no idle, > >nothing in I/O (according to top). > > > >I'm suspecting NFS problems, but the Netapp and switch traffic graphics > >look clean and consistent. Nothing shows network errors, not nfsstat, not > >the switch ports, not the netapp. > > > >And like I mentioned, the problem moves. One day on machine 1, tomorrow > >on 4, etc No real pattern. > > > >How would I go about trying to discover what the kernel is doing when this > >is happening. Some of the simple dtrace stuff I've tried have just shown > >me alot lof lwp_parks (the main apps is heavily multithreaded, so that > >figures). > > > >Anyone got any key dtrace probes they look at for NFS or dnlc problems? > > > One fairly simple thing to try (to start), would be a "lockstat -I", > which essentially does some simple kernel profiling. > In a coarse sense that should give you an idea as to where (the kernel > at least) is spending the bulk of it's time. You'll want > to kick that off during one of the load spikes... > > Thanks, > -Eric > _______________________________________________ > perf-discuss mailing list > perf-discuss@opensolaris.org _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org