Hi there,

On one of my clusters  I have following issues.
This particular cluster has two V240s with 2 CPUs each.
1. The CPU usage is high, often 100%, with both high sy and us time ( about 1:1 
)
2. mutex is high, which goes as high as 6000+ / CPU (!)
3. cross call is high, as high as 7000/CPU

The cluster is running following stuff:
1. Veritas Cluster Server 5.0MP1, with 20 something service groups
2. MeasureWare from HP OVO

Before you say "upgrade the hardware!", there are a few problems 
1. The mutex is so high, that adding more CPUs would just worsen the problem. 
2. xcall is high, additional CPUs will have the burden too.

So here are the things I did:
1. Shutdown VCS, with everything else running.
user space cpu usage dropped about 20-35%
kernel space cpu usage dropped by about 30%
mutex dropped from thousands to tens with brief spikes of hundreds

2. Shutdown measure ware
CPU usage became 92-99% idle!
mutex dropped further, but it's of no significance now.

3. With measure ware dowe, started up VCS
Now mutex does not go high a lot
cpu util in sy went high a little
but CPU usage in user space went high quite a lot, idle time goes to 0% often

So in short, measure ware and VCS is killing the box.

lockstat shows proc filesystem locking ( mutex spin! ) is high:
[EMAIL PROTECTED]:/tmp#more lockstat.out

Adaptive mutex spin: 68526 events in 5.048 seconds (13575 events/sec)

Count indv cuml rcnt     spin Lock                   Caller
-------------------------------------------------------------------------------
67056  98%  98% 0.00        3 pidlock                pr_readdir_procdir+0x74
  272   0%  98% 0.00        3 pidlock                pr_lookup_procdir+0xa0
  216   0%  99% 0.00        4 0x6000abd0200          cv_wait_sig+0x13c

Which is not too much of a surprise for VCS - lots of monitoring are simply 
running versions of "ps" to check if a process is up and running

I can simply go to veritas and say their product is a piece of sh*t, but I 
wonder:
1. Is Sun Cluster any better on this front?
2. For most of these checking/monitoring, is it possible to provide some kind 
of non-locking proc operations so this specific type of applications can use?
 
 
This message posted from opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to