This is due to various issues within /proc and other kernel process management locks. The only thing I can say at this time is avoid numerous prstat/top commands running all at once. A couple will not hurt you, but numerous prstat/top commands running at the same time can result in severe kernel lock contention on a couple different fronts (makes no difference even with multiple zones).

This problem can also occur if there is significant process management activities (create, delete etc) going
on as they require the same lock as prstat/top.

I am looking at how to fix this, but it is a tangled rats nest that has evolved over the past 15 to 20 years and is taking sometime to understand all the issues and come up with a working prototype.

Hope this helps

Dave Valin

 On 06/17/09 11:03, Mikael Kjerrman wrote:
Hi,

by accident I observed that various CPU on a rather heavily loaded Oracle 
server sometimes consumed 100% sys

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0  898   3 4207   624   88 2396  509  754  393    0  8190   64  14   0  23
  1  712   0 2148   360   21 1123  325  406  126    0 10699   70  16   0  13
  2 1341   4 2854   584    6 2389  556  778  406    0  8493   58  17   0  25
  3 1724   1 6668   355    6 1549  325  483  146    0  8317   65  16   0  19
  4 1431   9 4498   517    7 2507  480  822  443    0  8969   57  16   0  27
  5  898   2 4243   591    7 3172  544  867  313    0  8047   52  15   0  33
  6 1458  14 5020   613    6 2752  571  874  382    0 13157   54  18   0  28
  [b]7    0   0 2385  4387 4270    0    0    0   87    0     0    0 100   0   
0[/b]
 16  857   2 2496  1558  989 1658  565  610  218    0  6099   68  20   0  12
 17  796   2 4002  1387  962 1876  410  603  264    2  7909   50  22   0  28
 18  934   2 3066  6150  200 2501  611  815  312    0  7442   57  13   0  29
 19  829   1 2232 19714    5 1891  429  615  171    0  8566   58  13   0  29
 20  801  16 2783   554    5 2258  512  771  320    0 10670   58  15   0  27
 21 1012   6 3282   631   22 2532  580  794  237    0  8313   54  15   0  31
 22  709   7 2980   455    5 2342  423  719  331    0 11104   52  16   0  32
 23 1190   0 3396   503    6 2872  464  838  387    0  6682   54  16   0  31

and apparently this was caused by prstat. PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID [b] 2889 root 0.0 100 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 prstat/1[/b]

My question is if this is something to worry about or just works as expected?
I would like to know because when there is problem it is very common for people 
to start multiple prstat/top etc. to try to see what's going on and that 
behaviour worries me more than a single instance of a prstat running.


//Mike

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to