Long forgotten thread... Turns out it is ohci-hcd USB driver blasting insane amount of interrupts that is driving the load average up.
# grep ohci /proc/interrupts 169: 294912182 612411557 812332153 58723016 IO-APIC-level ohci_hcd, ohci_hcd The temporary fix obviously was 'rmmod ohci_hcd' (we don't need USB anyway) I'm currently working with [EMAIL PROTECTED] trying to understand whether this is a hardware or software related (RHEL AS4) issue. If you happen to know anything about this problem please drop me/list a line! On 7/10/06, Henry Ficher <[EMAIL PROTECTED]> wrote:
There could be disk and/or RAID problems affecting disk I/O, wich could lead to higher than normal load averages. Henry Michael Green wrote: > I have 18 identical Sun Fire X4100 systems here all configured > identically: > 4-way Opteron, 4G RAM, 70G SAS HDD, RHEL AS 4U3, Sun Grid Engine > agents (SGE) v6u7, NIS. > Periodically some of the systems exibit high load average while idling > for no obvious reason. Rebooting solves the problem, but after some > time the symptom returns. Typically the load average reaches 3 and > wouldn't go beyond that. How would you approach such a problem? > > One such system shows: > [EMAIL PROTECTED] ~]# w > 10:00:55 up 31 days, 17:47, 1 user, load average: 3.00, 3.00, 3.00 > USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT > root pts/1 192.168.1.100 09:31 0.00s 0.02s 0.00s w > > Typical top on this system: > top - 10:01:58 up 31 days, 17:49, 1 user, load average: 3.00, 3.00, > 3.00 > Tasks: 80 total, 1 running, 79 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.1% us, 0.0% sy, 0.0% ni, 99.9% id, 0.0% wa, 0.0% hi, > 0.0% si > Mem: 4051196k total, 891428k used, 3159768k free, 67620k buffers > Swap: 8160912k total, 4776k used, 8156136k free, 667488k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 1 root 16 0 4752 444 412 S 0.0 0.0 0:00.62 init > 2 root RT 0 0 0 0 S 0.0 0.0 0:00.25 migration/0 > 3 root 34 19 0 0 0 S 0.0 0.0 0:00.14 ksoftirqd/0 > 4 root RT 0 0 0 0 S 0.0 0.0 0:00.19 migration/1 > 5 root 34 19 0 0 0 S 0.0 0.0 0:13.35 ksoftirqd/1 > 6 root RT 0 0 0 0 S 0.0 0.0 0:00.19 migration/2 > > vmstat 2 > procs -----------memory---------- ---swap-- -----io---- --system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us > sy id wa > 0 0 4776 3159960 67620 667488 0 0 0 6 2007 28 0 > 1 99 0 > 0 0 4776 3159960 67620 667488 0 0 0 0 2007 25 0 > 0 100 0 > 0 0 4776 3159960 67620 667488 0 0 0 0 2005 23 0 > 0 100 0 > 0 0 4776 3159960 67620 667488 0 0 0 0 2018 28 0 > 1 99 0 > 0 0 4776 3159960 67620 667488 0 0 0 0 2023 23 0 > 0 100 0 > 0 0 4776 3159960 67620 667488 0 0 0 0 2008 25 0 > 0 100 0 > 0 0 4776 3159960 67620 667488 0 0 0 0 2009 22 0 > 0 100 0 > 0 0 4776 3159960 67620 667488 0 0 0 0 2006 22 0 > 1 99 0 > 0 0 4776 3159960 67620 667488 0 0 0 0 2008 26 0 > 0 100 0 > > What I've noticed here is that the rate of interrupts is relatively > high: 2000 appr. > On this particular system the rate of interrupts after reboot is > approximately 1000: > procs -----------memory---------- ---swap-- -----io---- --system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us > sy id wa > 0 0 0 3840536 12464 128976 0 0 339 42 276 172 2 > 2 90 7 > 0 0 0 3840536 12464 128976 0 0 0 0 1081 122 0 > 0 100 0 > 2 0 0 3840536 12464 128976 0 0 0 0 1083 112 0 > 1 99 0 > 0 0 0 3840672 12472 129036 0 0 0 16 1064 119 0 > 0 100 0 > 0 0 0 3840672 12472 129036 0 0 0 0 1065 112 0 > 0 100 0 > 0 0 0 3840672 12472 129036 0 0 0 0 1064 116 0 > 0 100 0 > 0 0 0 3840672 12472 129036 0 0 0 0 1066 116 0 > 0 100 0 > >
-- Warm regards, Michael Green ================================================================= To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]