Jeffrey Collyer wrote:
5 identical V440s, Solaris 10, storage on a Netapp via NFS, providing access to
a mailstore (so 80% read, 20% write).
During the day, randomly a machine will start to climb its load from the
baseline of 2-3 up to 50-60. Under heavy loading, I've seen it go up to 300.
All the time will be in split almost 50/50 user and kernel, no idle, nothing in
I/O (according to top).
I'm suspecting NFS problems, but the Netapp and switch traffic graphics look
clean and consistent. Nothing shows network errors, not nfsstat, not the
switch ports, not the netapp.
And like I mentioned, the problem moves. One day on machine 1, tomorrow on 4,
etc No real pattern.
How would I go about trying to discover what the kernel is doing when this is
happening. Some of the simple dtrace stuff I've tried have just shown me alot
lof lwp_parks (the main apps is heavily multithreaded, so that figures).
Anyone got any key dtrace probes they look at for NFS or dnlc problems?
One fairly simple thing to try (to start), would be a "lockstat -I",
which essentially does some simple kernel profiling.
In a coarse sense that should give you an idea as to where (the kernel
at least) is spending the bulk of it's time. You'll want
to kick that off during one of the load spikes...
Thanks,
-Eric
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org