Another place to start might be with Brendan Gregg's DTrace tools:

        http://www.brendangregg.com/dtrace.html

His prustat, hotuser, hotkernel, and shortlived.d scripts might be
helpful in your situation.

-j

On Mon, May 21, 2007 at 01:50:27PM -0700, Eric Saxe wrote:
> Jeffrey Collyer wrote:
> >5 identical V440s, Solaris 10, storage on a Netapp via NFS, providing 
> >access to a mailstore (so 80% read, 20% write).
> >
> >During the day, randomly a machine will start to climb its load from the 
> >baseline of 2-3 up to 50-60.  Under heavy loading, I've seen it go up to 
> >300.  All the time will be in split almost 50/50 user and kernel, no idle, 
> >nothing in I/O (according to top).
> >
> >I'm suspecting NFS problems, but the Netapp and switch traffic graphics 
> >look clean and consistent.  Nothing shows network errors, not nfsstat, not 
> >the switch ports, not the netapp.
> >
> >And like I mentioned, the problem moves.  One day on machine 1, tomorrow 
> >on 4, etc No real pattern.
> >
> >How would I go about trying to discover what the kernel is doing when this 
> >is happening.  Some of the simple dtrace stuff I've tried have just shown 
> >me alot lof lwp_parks (the main apps is heavily multithreaded, so that 
> >figures).
> >
> >Anyone got any key dtrace probes they look at for NFS or dnlc problems?
> >  
> One fairly simple thing to try (to start), would be a "lockstat -I", 
> which essentially does some simple kernel profiling.
> In a coarse sense that should give you an idea as to where (the kernel 
> at least) is spending the bulk of it's time. You'll want
> to kick that off during one of the load spikes...
> 
> Thanks,
> -Eric
> _______________________________________________
> perf-discuss mailing list
> perf-discuss@opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to