On Sun, Nov 06, 2005 at 01:05:39PM +0200, Oded Arbel wrote:
> 
> Hi list.
> 
> I have a problem with a P4 (hyper-threaded) powered server. It 
> constantly has a load average of 2.something, while looking with top I 
> don't see any process actually taking all that CPU resource.

I don't have a very good idea, but I'll say random things as noone else
did yet :-(

> The server is mostly used to ran Nagios monitor and some Java daemons. 
> Tomcat is running taking about 1.2GB of virtual, which is about 60% of 
> all memory, but it sees absolutely no usage and uses less then 5% real 
> memory. two other java services and MySQL together grab another 600MB 
> of virtual and everything else is mostly scripts and use negliable 
> amounts of VIRT, RES and CPU.

Maybe one of the scripts/daemons has a loop of quite short delays?
Testing this isn't very easy - you can either strace some of the
suspects or try something like syscalltrack.

> 
> Another weird thing is that a quick calculation would have the VIRT 
> usage of the system very close to the total memory available (1GB 
> physical + 1GB swap), yet the top output above shows more then half of 
> memory to be available(!).

That's because linux does by default overcommiting. For an exaplanation
see e.g.
http://www.novell.com/coolsolutions/qna/11511.html
(found by googling for 'overcommit linux', this is not the first result
but seems the most relevant among the first page).

> 
> At this point I'm clueless and would appreciate if anyone has any idea 
> that might explain these figures, or stuff that I can try on the server 
> (its a production server, so don't try to be funny :-).

A simple thing you can do to see what's going on with the cpu is to run
a short tight loop, e.g.
for (i=0; i<1000000000; i++);
measure how much time one such takes. Then run two in parallel. If they
take together almost exactly twice as much, it means the cpu is
otherwise idle. If it indeed runs some other two "endless" loops, they
should each get 1/4 of the cpu while you run two of yours, compared to
1/3 with one loop, which means they'll take together only 1.333 more
time instead of twice.

> 
> My only other option currently is to init1;init 3, which all things 
> considered I'm loath to do although the Nagios warnings are getting 
> quite annoying.

I suggest that you wait some more before doing that, as you might
never know what was the problem otherwise.
-- 
Didi


=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to