Thanks Mike.

> Is it unresponsive only when dtrace is running or
> normally?  

It becomes unresponsive after starting Dtrace. 


> With recent releases of Solaris, I've found systems to be
> quite responsive
> with a load average that is many times higher than
> the number of CPU's
> (as seen by mpstat - 128 for the typical dual
> processor T5140/T5240).

The system is a T5440. 256G RAM.

> It seems highly unlikely that the problem is related
> to being short on
> CPU (again, only at about 12% CPU utilization).

vmstat reports more than 95%CPU free. core utilization is between 2-3%

> 
> If it is unresponsive or sluggish before you start
> dtrace, I would
> guess that one of the following is the case:

No. It gets sluggish after Dtrace is started.

> - The machine is short on RAM and is paging.  Use
> vmstat to diagnose.
> Look at the "b" column (blocked on I/O) and paging
> related columns
> such as sr (scan rate). You would see things as being
> extremely
> sluggish (e.g. when executing a command) because the
> disk reads needed
> to load the commands and related libraries are
> getting queue behind
> the IO requests for paging.
> 

there's plenty of RAM ~240G


> - The network is having troubles.  Look for a duplex
> mismatch or
> non-zero values in:
> 
> kstat -p e1000g | nawk '$NF != 0 && $0 ~
>  /(err|drop|fail)/'
> - There is some other I/O problem.  Does iostat -En
> show hard errors
> on any disk?  Does "iostat -xzn  1" show svc_time +
> wsvc_time over
> 20ms?  How many I/Os are queued and active?
> 
> 
> Your question is performance - but you jumped to the
> conclusion that
> dtrace would tell you the answer.  It may, but there
> are likely other
> tools that will be helpful with a lot less effort and
> less system
> impact.  perf-discuss may be a better list to ask for
> more help.

we were checking application performance when we enaged this script to check 
where the hot spots were; we had to Ctrl-c dtrace because of it behavior.

Even now on a idle server (same system) here is what is what I see, although 
not that unresponsive now ( vey little load to start with)

Before dtrace:
Total: 162 processes, 2058 lwps, load averages: 0.79, 2.87, 2.26
Total: 162 processes, 2058 lwps, load averages: 1.03, 2.89, 2.27
After Dtrace
Total: 161 processes, 2057 lwps, load averages: 20.61, 6.88, 3.61
Total: 161 processes, 2057 lwps, load averages: 38.40, 10.76, 4.93
Total: 161 processes, 2057 lwps, load averages: 35.38, 10.59, 4.91

This time I was able to get some o/p from the script otherwise with load I have 
not seen script o/p.  Now you can imagine the state of the system if the 
initail load was 10-15.

Please let mw know if you need more details.
-- 
This message posted from opensolaris.org
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Reply via email to