This is due to various issues within /proc and other kernel process
management locks. The only thing
I can say at this time is avoid numerous prstat/top commands running all
at once. A couple will not hurt
you, but numerous prstat/top commands running at the same time can
result in severe kernel lock contention
on a couple different fronts (makes no difference even with multiple
zones).
This problem can also occur if there is significant process management
activities (create, delete etc) going
on as they require the same lock as prstat/top.
I am looking at how to fix this, but it is a tangled rats nest that has
evolved over the past 15 to 20 years and
is taking sometime to understand all the issues and come up with a
working prototype.
Hope this helps
Dave Valin
On 06/17/09 11:03, Mikael Kjerrman wrote:
Hi,
by accident I observed that various CPU on a rather heavily loaded Oracle
server sometimes consumed 100% sys
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 898 3 4207 624 88 2396 509 754 393 0 8190 64 14 0 23
1 712 0 2148 360 21 1123 325 406 126 0 10699 70 16 0 13
2 1341 4 2854 584 6 2389 556 778 406 0 8493 58 17 0 25
3 1724 1 6668 355 6 1549 325 483 146 0 8317 65 16 0 19
4 1431 9 4498 517 7 2507 480 822 443 0 8969 57 16 0 27
5 898 2 4243 591 7 3172 544 867 313 0 8047 52 15 0 33
6 1458 14 5020 613 6 2752 571 874 382 0 13157 54 18 0 28
[b]7 0 0 2385 4387 4270 0 0 0 87 0 0 0 100 0
0[/b]
16 857 2 2496 1558 989 1658 565 610 218 0 6099 68 20 0 12
17 796 2 4002 1387 962 1876 410 603 264 2 7909 50 22 0 28
18 934 2 3066 6150 200 2501 611 815 312 0 7442 57 13 0 29
19 829 1 2232 19714 5 1891 429 615 171 0 8566 58 13 0 29
20 801 16 2783 554 5 2258 512 771 320 0 10670 58 15 0 27
21 1012 6 3282 631 22 2532 580 794 237 0 8313 54 15 0 31
22 709 7 2980 455 5 2342 423 719 331 0 11104 52 16 0 32
23 1190 0 3396 503 6 2872 464 838 387 0 6682 54 16 0 31
and apparently this was caused by prstat.
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
[b] 2889 root 0.0 100 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 prstat/1[/b]
My question is if this is something to worry about or just works as expected?
I would like to know because when there is problem it is very common for people
to start multiple prstat/top etc. to try to see what's going on and that
behaviour worries me more than a single instance of a prstat running.
//Mike
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org