On Tuesday, 8 בNovember 2005 13:06, Oded Arbel wrote:
> On Sunday, 6 ׳‘November 2005 22:13, Yedidyah Bar-David wrote:
> > Maybe one of the scripts/daemons has a loop of quite short delays?
> > Testing this isn't very easy - you can either strace some of the
> > suspects or try something like syscalltrack.

> Then I started removing processes until I got to the culprit - the
> Java program that implements the services provided by the server. I
> of course did the testing in the off-peak hours so there will be no
> disturbance of service to our client. At that time there was
> absolutely no activity whatsoever on any of the services, so the only
> thing the Java program was supposed to do was call wait() (a Java
> thread synchronization call) every second, which was indeed verified
> by stracing the Java process, and here is the output:
>
> futex(0x4d907b60, FUTEX_WAIT, 233, {0, 265545000}) = -1 ETIMEDOUT
> (Connection timed out)
> futex(0x805d33c, FUTEX_WAKE, 1)         = 0
> gettimeofday({1131445296, 417683}, NULL) = 0
> clock_gettime(0, {1131445296, 417799000}) = 0

I found the problem - the Java process which was supposed to be only in 
wait() state (which I assume was what all the futex calls where about), 
had a thread which was busy looping. It was actually going very quickly 
back and forth through a memory barrier (synchronization), which might 
have explained the futex had I not expected this to happen quite more 
often then about once a second.

I fixed the code and now the machine is down to a more reasonable usage 
- 0.30 under normal load conditions.

I still don't understand why the Java process wasn't showing on the 
ps/top list - it didn't even have a lot of 'total cpu time' allocated 
to it.

-- 
Oded

::..
Proofread carefully to see if you any words out.

================================================================To unsubscribe, 
send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to