Re: High load average for no obvious reason.

Michael Green Mon, 30 Oct 2006 01:32:33 -0800

Long forgotten thread...
Turns out it is ohci-hcd USB driver blasting insane amount of
interrupts that is driving the load average up.


# grep ohci /proc/interrupts
169:  294912182  612411557  812332153   58723016   IO-APIC-level
ohci_hcd, ohci_hcd

The temporary fix obviously was 'rmmod ohci_hcd' (we don't need USB anyway)
I'm currently working with [EMAIL PROTECTED] trying to understand whether this 
is
a hardware or software related (RHEL AS4) issue.

If you happen to know anything about this problem please drop me/list a line!



On 7/10/06, Henry Ficher <[EMAIL PROTECTED]> wrote:

There could be disk and/or RAID problems affecting disk I/O, wich could
lead to higher than normal load averages.


Henry


Michael Green wrote:

> I have 18 identical Sun Fire X4100 systems here all configured
> identically:
> 4-way Opteron, 4G RAM, 70G SAS HDD, RHEL AS 4U3, Sun Grid Engine
> agents (SGE) v6u7, NIS.
> Periodically some of the systems exibit high load average while idling
> for no obvious reason. Rebooting solves the problem, but after some
> time the symptom returns. Typically the load average reaches 3 and
> wouldn't go beyond that. How would you approach such a problem?
>
> One such system shows:
> [EMAIL PROTECTED] ~]# w
> 10:00:55 up 31 days, 17:47,  1 user,  load average: 3.00, 3.00, 3.00
> USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
> root     pts/1    192.168.1.100    09:31    0.00s  0.02s  0.00s w
>
> Typical top on this system:
> top - 10:01:58 up 31 days, 17:49,  1 user,  load average: 3.00, 3.00,
> 3.00
> Tasks:  80 total,   1 running,  79 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.1% us,  0.0% sy,  0.0% ni, 99.9% id,  0.0% wa,  0.0% hi,
> 0.0% si
> Mem:   4051196k total,   891428k used,  3159768k free,    67620k buffers
> Swap:  8160912k total,     4776k used,  8156136k free,   667488k cached
>
> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>   1 root      16   0  4752  444  412 S  0.0  0.0   0:00.62 init
>   2 root      RT   0     0    0    0 S  0.0  0.0   0:00.25 migration/0
>   3 root      34  19     0    0    0 S  0.0  0.0   0:00.14 ksoftirqd/0
>   4 root      RT   0     0    0    0 S  0.0  0.0   0:00.19 migration/1
>   5 root      34  19     0    0    0 S  0.0  0.0   0:13.35 ksoftirqd/1
>   6 root      RT   0     0    0    0 S  0.0  0.0   0:00.19 migration/2
>
> vmstat 2
> procs -----------memory---------- ---swap-- -----io---- --system--
> ----cpu----
> r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
> sy id wa
> 0  0   4776 3159960  67620 667488    0    0     0     6 2007    28  0
> 1 99  0
> 0  0   4776 3159960  67620 667488    0    0     0     0 2007    25  0
> 0 100  0
> 0  0   4776 3159960  67620 667488    0    0     0     0 2005    23  0
> 0 100  0
> 0  0   4776 3159960  67620 667488    0    0     0     0 2018    28  0
> 1 99  0
> 0  0   4776 3159960  67620 667488    0    0     0     0 2023    23  0
> 0 100  0
> 0  0   4776 3159960  67620 667488    0    0     0     0 2008    25  0
> 0 100  0
> 0  0   4776 3159960  67620 667488    0    0     0     0 2009    22  0
> 0 100  0
> 0  0   4776 3159960  67620 667488    0    0     0     0 2006    22  0
> 1 99  0
> 0  0   4776 3159960  67620 667488    0    0     0     0 2008    26  0
> 0 100  0
>
> What I've noticed here is that the rate of interrupts is relatively
> high: 2000 appr.
> On this particular system the rate of interrupts after reboot is
> approximately 1000:
> procs -----------memory---------- ---swap-- -----io---- --system--
> ----cpu----
> r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
> sy id wa
> 0  0      0 3840536  12464 128976    0    0   339    42  276   172  2
> 2 90  7
> 0  0      0 3840536  12464 128976    0    0     0     0 1081   122  0
> 0 100  0
> 2  0      0 3840536  12464 128976    0    0     0     0 1083   112  0
> 1 99  0
> 0  0      0 3840672  12472 129036    0    0     0    16 1064   119  0
> 0 100  0
> 0  0      0 3840672  12472 129036    0    0     0     0 1065   112  0
> 0 100  0
> 0  0      0 3840672  12472 129036    0    0     0     0 1064   116  0
> 0 100  0
> 0  0      0 3840672  12472 129036    0    0     0     0 1066   116  0
> 0 100  0
>
>



--
Warm regards,
Michael Green

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Re: High load average for no obvious reason.

Reply via email to