Re: High load average for no obvious reason.

Oren Held Sun, 09 Jul 2006 12:53:17 -0700

Try to look for processes which are in zombie (defunct) state.

If I'm not mistaken, for some reason they tend to be counted when kernelcalculates the load average.


Michael Green wrote:

I have 18 identical Sun Fire X4100 systems here all configuredidentically:
4-way Opteron, 4G RAM, 70G SAS HDD, RHEL AS 4U3, Sun Grid Engine
agents (SGE) v6u7, NIS.
Periodically some of the systems exibit high load average while idling
for no obvious reason. Rebooting solves the problem, but after some
time the symptom returns. Typically the load average reaches 3 and
wouldn't go beyond that. How would you approach such a problem?

One such system shows:
[EMAIL PROTECTED] ~]# w
10:00:55 up 31 days, 17:47,  1 user,  load average: 3.00, 3.00, 3.00
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/1    192.168.1.100    09:31    0.00s  0.02s  0.00s w

Typical top on this system:
top - 10:01:58 up 31 days, 17:49, 1 user, load average: 3.00, 3.00,3.00
Tasks:  80 total,   1 running,  79 sleeping,   0 stopped,   0 zombie
Cpu(s): 0.1% us, 0.0% sy, 0.0% ni, 99.9% id, 0.0% wa, 0.0% hi,0.0% si
Mem:   4051196k total,   891428k used,  3159768k free,    67620k buffers
Swap:  8160912k total,     4776k used,  8156136k free,   667488k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  1 root      16   0  4752  444  412 S  0.0  0.0   0:00.62 init
  2 root      RT   0     0    0    0 S  0.0  0.0   0:00.25 migration/0
  3 root      34  19     0    0    0 S  0.0  0.0   0:00.14 ksoftirqd/0
  4 root      RT   0     0    0    0 S  0.0  0.0   0:00.19 migration/1
  5 root      34  19     0    0    0 S  0.0  0.0   0:13.35 ksoftirqd/1
  6 root      RT   0     0    0    0 S  0.0  0.0   0:00.19 migration/2

vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system------cpu----r b swpd free buff cache si so bi bo in cs ussy id wa0 0 4776 3159960 67620 667488 0 0 0 6 2007 28 01 99 00 0 4776 3159960 67620 667488 0 0 0 0 2007 25 00 100 00 0 4776 3159960 67620 667488 0 0 0 0 2005 23 00 100 00 0 4776 3159960 67620 667488 0 0 0 0 2018 28 01 99 00 0 4776 3159960 67620 667488 0 0 0 0 2023 23 00 100 00 0 4776 3159960 67620 667488 0 0 0 0 2008 25 00 100 00 0 4776 3159960 67620 667488 0 0 0 0 2009 22 00 100 00 0 4776 3159960 67620 667488 0 0 0 0 2006 22 01 99 00 0 4776 3159960 67620 667488 0 0 0 0 2008 26 00 100 0
What I've noticed here is that the rate of interrupts is relatively
high: 2000 appr.
On this particular system the rate of interrupts after reboot is
approximately 1000:
procs -----------memory---------- ---swap-- -----io---- --system------cpu----r b swpd free buff cache si so bi bo in cs ussy id wa0 0 0 3840536 12464 128976 0 0 339 42 276 172 22 90 70 0 0 3840536 12464 128976 0 0 0 0 1081 122 00 100 02 0 0 3840536 12464 128976 0 0 0 0 1083 112 01 99 00 0 0 3840672 12472 129036 0 0 0 16 1064 119 00 100 00 0 0 3840672 12472 129036 0 0 0 0 1065 112 00 100 00 0 0 3840672 12472 129036 0 0 0 0 1064 116 00 100 00 0 0 3840672 12472 129036 0 0 0 0 1066 116 00 100 0



=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Re: High load average for no obvious reason.

Reply via email to