On Tuesday, 8 בNovember 2005 13:06, Oded Arbel wrote: > On Sunday, 6 ׳‘November 2005 22:13, Yedidyah Bar-David wrote: > > Maybe one of the scripts/daemons has a loop of quite short delays? > > Testing this isn't very easy - you can either strace some of the > > suspects or try something like syscalltrack.
> Then I started removing processes until I got to the culprit - the > Java program that implements the services provided by the server. I > of course did the testing in the off-peak hours so there will be no > disturbance of service to our client. At that time there was > absolutely no activity whatsoever on any of the services, so the only > thing the Java program was supposed to do was call wait() (a Java > thread synchronization call) every second, which was indeed verified > by stracing the Java process, and here is the output: > > futex(0x4d907b60, FUTEX_WAIT, 233, {0, 265545000}) = -1 ETIMEDOUT > (Connection timed out) > futex(0x805d33c, FUTEX_WAKE, 1) = 0 > gettimeofday({1131445296, 417683}, NULL) = 0 > clock_gettime(0, {1131445296, 417799000}) = 0 I found the problem - the Java process which was supposed to be only in wait() state (which I assume was what all the futex calls where about), had a thread which was busy looping. It was actually going very quickly back and forth through a memory barrier (synchronization), which might have explained the futex had I not expected this to happen quite more often then about once a second. I fixed the code and now the machine is down to a more reasonable usage - 0.30 under normal load conditions. I still don't understand why the Java process wasn't showing on the ps/top list - it didn't even have a lot of 'total cpu time' allocated to it. -- Oded ::.. Proofread carefully to see if you any words out. ================================================================To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]