A Conundrum: weird load average

Oded Arbel Sun, 06 Nov 2005 03:19:42 -0800

Hi list.

I have a problem with a P4 (hyper-threaded) powered server. It 
constantly has a load average of 2.something, while looking with top I 
don't see any process actually taking all that CPU resource.


Here's a snippet of /proc/cpuinfo:
vendor_id       : GenuineIntel
cpu family      : 15
model           : 3
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping        : 4
cpu MHz         : 2993.807
cache size      : 1024 KB
(I have two of those listed, of course)

And here's a sample of top's output (sorted by CPU usage):
top - 10:36:47 up 84 days,  1:27,  3 users,  load average: 2.01, 2.02, 
1.97
Tasks:  73 total,   1 running,  65 sleeping,   0 stopped,   7 zombie
Cpu(s): 100.0% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  
0.0% si
Mem:   1032328k total,  1010516k used,    21812k free,   146464k buffers
Swap:  1036152k total,    14944k used,  1021208k free,   673464k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  445 root      16   0     0    0    0 S  0.3  0.0   1:17.11 kjournald
  964 root      16   0  2172 1044 1964 R  0.3  0.1   0:00.19 top
    1 root      16   0  1580  508 1424 S  0.0  0.0   0:02.47 init
    2 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.10 ksoftirqd/0
    4 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/1
    5 root      34  19     0    0    0 S  0.0  0.0   0:00.05 ksoftirqd/1
    6 root       5 -10     0    0    0 S  0.0  0.0   0:00.00 events/0
    7 root       5 -10     0    0    0 S  0.0  0.0   0:00.00 events/1
   [lots of other kernel processes and then some user processes]

As you can see, the userspace load is 100% which I assume means 100% of 
all processor resources in the system (which accounts for the 2.0 load 
average as we have virtually 2 of those), but no process listed by top 
actually takes any significant amount of CPU time. Listing by time I 
get kjournald with almost 6 hours of CPU time (over 84 days - had a 
power failure about 3 months back), then sshd (the server is headless) 
then everyone else has less then 10 minutes.

The server is mostly used to ran Nagios monitor and some Java daemons. 
Tomcat is running taking about 1.2GB of virtual, which is about 60% of 
all memory, but it sees absolutely no usage and uses less then 5% real 
memory. two other java services and MySQL together grab another 600MB 
of virtual and everything else is mostly scripts and use negliable 
amounts of VIRT, RES and CPU.

Another weird thing is that a quick calculation would have the VIRT 
usage of the system very close to the total memory available (1GB 
physical + 1GB swap), yet the top output above shows more then half of 
memory to be available(!).

At this point I'm clueless and would appreciate if anyone has any idea 
that might explain these figures, or stuff that I can try on the server 
(its a production server, so don't try to be funny :-).

My only other option currently is to init1;init 3, which all things 
considered I'm loath to do although the Nagios warnings are getting 
quite annoying.

-- 
Oded

::..
The most wasted of all days is one without laughter.
        -- e.e. cummings

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

A Conundrum: weird load average

Reply via email to