Try to look for processes which are in zombie (defunct) state.
If I'm not mistaken, for some reason they tend to be counted when kernel calculates the load average.

Michael Green wrote:
I have 18 identical Sun Fire X4100 systems here all configured identically:
4-way Opteron, 4G RAM, 70G SAS HDD, RHEL AS 4U3, Sun Grid Engine
agents (SGE) v6u7, NIS.
Periodically some of the systems exibit high load average while idling
for no obvious reason. Rebooting solves the problem, but after some
time the symptom returns. Typically the load average reaches 3 and
wouldn't go beyond that. How would you approach such a problem?

One such system shows:
[EMAIL PROTECTED] ~]# w
10:00:55 up 31 days, 17:47,  1 user,  load average: 3.00, 3.00, 3.00
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/1    192.168.1.100    09:31    0.00s  0.02s  0.00s w

Typical top on this system:
top - 10:01:58 up 31 days, 17:49, 1 user, load average: 3.00, 3.00, 3.00
Tasks:  80 total,   1 running,  79 sleeping,   0 stopped,   0 zombie
Cpu(s): 0.1% us, 0.0% sy, 0.0% ni, 99.9% id, 0.0% wa, 0.0% hi, 0.0% si
Mem:   4051196k total,   891428k used,  3159768k free,    67620k buffers
Swap:  8160912k total,     4776k used,  8156136k free,   667488k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  1 root      16   0  4752  444  412 S  0.0  0.0   0:00.62 init
  2 root      RT   0     0    0    0 S  0.0  0.0   0:00.25 migration/0
  3 root      34  19     0    0    0 S  0.0  0.0   0:00.14 ksoftirqd/0
  4 root      RT   0     0    0    0 S  0.0  0.0   0:00.19 migration/1
  5 root      34  19     0    0    0 S  0.0  0.0   0:13.35 ksoftirqd/1
  6 root      RT   0     0    0    0 S  0.0  0.0   0:00.19 migration/2

vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 4776 3159960 67620 667488 0 0 0 6 2007 28 0 1 99 0 0 0 4776 3159960 67620 667488 0 0 0 0 2007 25 0 0 100 0 0 0 4776 3159960 67620 667488 0 0 0 0 2005 23 0 0 100 0 0 0 4776 3159960 67620 667488 0 0 0 0 2018 28 0 1 99 0 0 0 4776 3159960 67620 667488 0 0 0 0 2023 23 0 0 100 0 0 0 4776 3159960 67620 667488 0 0 0 0 2008 25 0 0 100 0 0 0 4776 3159960 67620 667488 0 0 0 0 2009 22 0 0 100 0 0 0 4776 3159960 67620 667488 0 0 0 0 2006 22 0 1 99 0 0 0 4776 3159960 67620 667488 0 0 0 0 2008 26 0 0 100 0

What I've noticed here is that the rate of interrupts is relatively
high: 2000 appr.
On this particular system the rate of interrupts after reboot is
approximately 1000:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 3840536 12464 128976 0 0 339 42 276 172 2 2 90 7 0 0 0 3840536 12464 128976 0 0 0 0 1081 122 0 0 100 0 2 0 0 3840536 12464 128976 0 0 0 0 1083 112 0 1 99 0 0 0 0 3840672 12472 129036 0 0 0 16 1064 119 0 0 100 0 0 0 0 3840672 12472 129036 0 0 0 0 1065 112 0 0 100 0 0 0 0 3840672 12472 129036 0 0 0 0 1064 116 0 0 100 0 0 0 0 3840672 12472 129036 0 0 0 0 1066 116 0 0 100 0




=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to