On 9 May 2012 15:59, Wright, Clark <cwri...@litle.com> wrote:
> I have a set of 12 hour builds that run across 45 nodes on 3 machines (4 if
> you count the master).
>
>
>
> All the machines are Red Hat Enterprise.
>
> All the communication is via ssh (both job launch and node startup).
>
>
>
> Here is the problem I am trying to track down:
>
>
>
> Sometimes, the job finishes, and the node immediately (within a few minutes)
> updates its status with the master and is ready for the next job.
>
>
>
> Sometimes, however, it will take the node hours to realize the job is
> finished and update.  Of my 45 nodes, 10 are currently in this state.
>
>
>
> The job itself is a paramerized job, the actual build is this shell
> fragment:
>
>
>
> #!/bin/sh
>
> source ~/.bashrc
>
> echo "Build Starting..."
>
> $CVSHOME/build/scripts/armada/galleons/allIntegration.sh
>
> echo "Build Finished"
>
> exit 0
>
>
>
> There are No post build actions.
>
>
>
>
>
> So the questions I have are:
>
> 1.       What is the polling cycle on the node monitoring the job and is it
> configurable?

Not how the remoting works

>
> 2.       Is there a way to get more information out of the node than just
> pinging systeminfo on the main Jenkins?

Yes via the groovy script console

>
> 3.       Where in the Jenkins code base is the node management code?
>

Scattered all over, you will want to look into the remoting module,
and look at the Slave and Computer classes.

But in reality you probably want to look at how the queue works and
not node management.

You might want to investigate the GC cpu time on the slaves and the master.

>
>
>
>
> This is the thread dump for one of them (http://jenkins/node1/systeminfo )
>
> Thread Dump
>
> Channel reader thread: channel
>
>
>
> "Channel reader thread: channel" Id=9 Group=main RUNNABLE (in native)
>
>                 at java.io.FileInputStream.readBytes(Native Method)
>
>                 at java.io.FileInputStream.read(FileInputStream.java:199)
>
>                 at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>
>                 at
> java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>
>                 -  locked java.io.BufferedInputStream@2486ae
>
>                 at
> java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
>
>                 at
> java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
>
>                 at
> java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
>
>                 at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
>
>                 at
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>
>                 at
> hudson.remoting.Channel$ReaderThread.run(Channel.java:1030)
>
>
>
>
>
> main
>
>
>
> "main" Id=1 Group=main WAITING on hudson.remoting.Channel@a17083
>
>                 at java.lang.Object.wait(Native Method)
>
>                 -  waiting on hudson.remoting.Channel@a17083
>
>                 at java.lang.Object.wait(Object.java:485)
>
>                 at hudson.remoting.Channel.join(Channel.java:766)
>
>                 at hudson.remoting.Launcher.main(Launcher.java:420)
>
>                 at
> hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366)
>
>                 at hudson.remoting.Launcher.run(Launcher.java:206)
>
>                 at hudson.remoting.Launcher.main(Launcher.java:168)
>
>
>
>
>
> Ping thread for channel hudson.remoting.Channel@a17083:channel
>
>
>
> "Ping thread for channel hudson.remoting.Channel@a17083:channel" Id=10
> Group=main TIMED_WAITING
>
>                 at java.lang.Thread.sleep(Native Method)
>
>                 at hudson.remoting.PingThread.run(PingThread.java:86)
>
>
>
>
>
> pool-1-thread-666
>
>
>
> "pool-1-thread-666" Id=719 Group=main RUNNABLE
>
>                 at sun.management.ThreadImpl.dumpThreads0(Native Method)
>
>                 at
> sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374)
>
>                 at hudson.Functions.getThreadInfos(Functions.java:872)
>
>                 at
> hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:93)
>
>                 at
> hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:89)
>
>                 at hudson.remoting.UserRequest.perform(UserRequest.java:118)
>
>                 at hudson.remoting.UserRequest.perform(UserRequest.java:48)
>
>                 at hudson.remoting.Request$2.run(Request.java:287)
>
>                 at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>
>                 at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>
>                 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>
>                 at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>                 at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>                 at java.lang.Thread.run(Thread.java:619)
>
>
>
>                 Number of locked synchronizers = 1
>
>                 -
> java.util.concurrent.locks.ReentrantLock$NonfairSync@1630de2
>
>
>
>
>
> Finalizer
>
>
>
> "Finalizer" Id=3 Group=system WAITING on
> java.lang.ref.ReferenceQueue$Lock@64514
>
>                 at java.lang.Object.wait(Native Method)
>
>                 -  waiting on java.lang.ref.ReferenceQueue$Lock@64514
>
>                 at
> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
>
>                 at
> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
>
>                 at
> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>
>
>
>
>
> Reference Handler
>
>
>
> "Reference Handler" Id=2 Group=system WAITING on
> java.lang.ref.Reference$Lock@1a12930
>
>                 at java.lang.Object.wait(Native Method)
>
>                 -  waiting on java.lang.ref.Reference$Lock@1a12930
>
>                 at java.lang.Object.wait(Object.java:485)
>
>                 at
> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>
>
>
>
>
> Signal Dispatcher
>
>
>
> "Signal Dispatcher" Id=4 Group=system RUNNABLE
>
>
>
> Thank you,
>
>
>
> -Clark.
>
> The information in this message is for the intended recipient(s) only and
> may be the proprietary and/or confidential property of Litle & Co., LLC, and
> thus protected from disclosure. If you are not the intended recipient(s), or
> an employee or agent responsible for delivering this message to the intended
> recipient, you are hereby notified that any use, dissemination, distribution
> or copying of this communication is prohibited. If you have received this
> communication in error, please notify Litle & Co. immediately by replying to
> this message and then promptly deleting it and your reply permanently from
> your computer.

Reply via email to