Re: Mac OSX slave timeouts

Sami Tikka Mon, 27 Aug 2012 15:04:37 -0700

The 3.5 sec clock drift is not yet alarming but it does cause me to raise an 
eyebrow. You are not synchronizing clocks between your Jenkins master, slave 
and SCM server?


Try to get the clocks in sync. Install ntp daemon where it is missing and 
configure them to sync to your organization's ntp server or pool.ntp.org.

If that does not help, go to http://YOURJENKINS/threadDump, save it to 
pastebin/gits and ask someone on the list to take a look.

-- Sami

Chuck Doucette <cdouce...@everyscape.com> kirjoitti 27.8.2012 kello 18.36:

> I rebooted the machine and reran every job destined to be run on that Mac 
> slave node.
> The all completed quickly and successfully.
> Now, as I monitor the slave node, here's what I see for it:
> 
> Clock Difference:
> 3.5 sec ahead
> 
> Response Time
> 3410 ms
> 
> When I login to the machine (as myself, not the Jenkins user), and I start 
> Activity Monitor, and I review All Processes, I see:
> a) highest CPU usage is <10% - averaging around 2% for ScreenSharingAgent and 
> 1% for Activity Monitor
> b) free memory: 1.8GB
> c) used memory: 1.2GB
> d) swap used: 0 bytes
> 
> So, I believe I have definitively answered your question about plenty of free 
> memory and not swapping.
> 
> When I run jconsole to review the Jenkins slave process, I see:
> a) 6MB of heap memory usage
> b) 15 live threads
> c) 2834 classes loaded
> d) 0.1% CPU usage
> 
> As I continue to review memory usage of the Jenkins slave process, I see 
> after an initial drop from ~20MB to ~4MB, I see a steady climb. It's already 
> back up to 15MB.
> 
> Chuck
> 
> On Aug 27, 2012, at 10:22 AM, Chuck Doucette <cdouce...@everyscape.com> wrote:
> 
>> Here is more information, I just saw this message on the Manage Jenkins 
>> screen (from the master node, about the mac slave with problems):
>> 
>> There are more SCM polling activities scheduled than handled, so the threads 
>> are not keeping up with the demands. Check if your polling is hanging, 
>> and/or increase the number of threads if 
>> necessary<http://cruisecontrol.office.everyscape.com:8080/descriptor/hudson.triggers.SCMTrigger/>.
>> 
>> when I clicked on the link, I saw this:
>> 
>> Current SCM Polling Activities
>> There are more SCM polling activities scheduled than handled, so the threads 
>> are not keeping up with the demands. Check if your polling is hanging, 
>> and/or increase the number of threads if necessary.
>> 
>> The following polling activities are currently in progress:
>> 
>> Project  
>> ↓<http://cruisecontrol.office.everyscape.com:8080/descriptor/hudson.triggers.SCMTrigger/#>
>>      Running for   
>> <http://cruisecontrol.office.everyscape.com:8080/descriptor/hudson.triggers.SCMTrigger/#>
>> ESSDK<http://cruisecontrol.office.everyscape.com:8080/job/ESSDK/scmPollLog/> 
>>    2 days 21 hr
>> Uscapeit-Android<http://cruisecontrol.office.everyscape.com:8080/job/Uscapeit-Android/scmPollLog/>
>>       2 days 21 hr
>> ScapeFolio<http://cruisecontrol.office.everyscape.com:8080/job/ScapeFolio/scmPollLog/>
>>   2 days 21 hr
>> 
>> This are all projects that only run on the Mac slave node.
>> 
>> I'm not sure how to kill these SCM polling jobs.
>> I do know how to kill regular build jobs.
>> Perhaps I can try SCM notification instead (notify jenkins to rebuild upon 
>> checkin).
>> 
>> Chuck
>> 
>> On Aug 27, 2012, at 10:11 AM, Chuck Doucette 
>> <cdouce...@everyscape.com<mailto:cdouce...@everyscape.com>> wrote:
>> 
>> Yes, I believe the Mac hardware is in good general health.
>> The machine has 3GB of physical memory, so I believe it has plenty of free 
>> memory.
>> I don't believe it is swapping - but I'm not sure how to tell.
>> I have tried running Activity Monitor and JConsole.
>> As far as I can tell, there is no other software running.
>> There is no Time Machine backup setup nor has any anti virus software been 
>> installed.
>> 
>> As I said below, I had to wipe the disk and reinstall everything from 
>> scratch.
>> So, it has: Mountain Lion, Java, Xcode.
>> That's about it.
>> Nobody else is logged on except the jenkins user over ssh.
>> 
>> Now builds that should take a few minutes are taking multiple hours, and I 
>> see that time synchronization is off by a few minutes. I will try to fix the 
>> latter right now.
>> 
>> Chuck
>> 
>> On Aug 24, 2012, at 4:54 PM, Sami Tikka 
>> <sjti...@gmail.com<mailto:sjti...@gmail.com>> wrote:
>> 
>> Just to rule out the obvious culprits:
>> 
>> - The Mac hardware is in good general health?
>> 
>> - There is plenty of free memory? The system is not swapping?
>> 
>> - There isn't some process running and taking a lot of cpu? Spotlight 
>> indexing, Time Machine backup, some anti-virus real-time scanner?
>> 
>> Even though Macs are great machines, even they can get messed up and become 
>> slow.
>> 
>> -- Sami
>> 
>> Chuck Doucette <cdouce...@everyscape.com<mailto:cdouce...@everyscape.com>> 
>> kirjoitti 24.8.2012 kello 20.19:
>> 
>> We are running Jenkins 1.478.
>> The master node is running on Windows 2003 (xp).
>> It has 3 slaves - 2 other Windos machines and 1 Mac.
>> The mac machine was working fine - then when I attempted to upgrade the O/S 
>> (from Snow Leopard to Lion) it failed due to disk errors.
>> I've since reconstituted the machine from scratch - so all of the hardware 
>> is the same but all of the software (and configurations) are brand new 
>> (Mountain Lion).
>> 
>> Something appears to be causing one of our slave nodes (on Mac OSX) to take 
>> longer and longer to respond.
>> It's currently at ~1000ms response time.
>> It has gotten up to 3000ms response time.
>> 
>> I have added two things to slave's launch JVM options to help in diagnosing 
>> and resolving the problem:
>> 1) -Dcom.sun.management.jmxremote (so I can monitor the performance of the 
>> slave process via jconsole)
>> 2) -Xmx2048m (to use 2GB of the 3GB of physical memory available on the 
>> machine)
>> 
>> The timeouts have apparently caused jobs to fail with errors about channel 
>> closing:
>> Started by upstream project "ScapeFolio" build number 83
>> 
>> [EnvInject] - Loading node environment variables.
>> [EnvInject] - [ERROR] - SEVERE ERROR occurs: 
>> hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected 
>> termination of the channel
>> Archiving artifacts
>> ERROR: Publisher hudson.tasks.Mailer aborted due to exception
>> 
>> hudson.remoting.ChannelClosedException
>> : channel is already closed
>> at
>> hudson.remoting.Channel.send(Channel.java:492)
>> Started by upstream project "ScapeFolio" build number 83
>> 
>> [EnvInject] - Loading node environment variables.
>> [EnvInject] - [ERROR] - SEVERE ERROR occurs: 
>> hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected 
>> termination of the channel
>> Archiving artifacts
>> ERROR: Publisher hudson.tasks.Mailer aborted due to exception
>> 
>> hudson.remoting.ChannelClosedException
>> : channel is already closed
>> at
>> hudson.remoting.Channel.send(Channel.java:492)
>> 
>> Does anyone have any recommendations on how to diagnose and resolve these 
>> problems?
>> 
>> Thanks,
>> Chuck
>> 
>> 
>> 
>> 
>> 
>

Re: Mac OSX slave timeouts

Reply via email to