Another thing you could look into is forked child processes having
captured stdout / stderr.

The process will not be seen as finished until all stdout/stderr has
been captured, so if your build leaves a non-daemon process hanging
around, that could be the RCA

On 9 May 2012 16:53, Stephen Connolly <stephen.alan.conno...@gmail.com> wrote:
> On 9 May 2012 16:31, Wright, Clark <cwri...@litle.com> wrote:
>> Thank you.
>>
>> So how does remoting work with respect to end of job notification?
>>
>> My initial assumption was that it was simply waiting for the forked process 
>> to finish, grab the resultant return code, and update the master.
>>
>
> well you could look at it like that, in actuality the better way to
> look at it is as more or a distributed jvm. The master sends a closure
> to the slave, the closure forks the child process and when the child
> process completes the closure should return the result to the master.
>
>> Also, any pointers/suggestions as to what information I need/want to get out 
>> of the groovy script console?
>>
>> Will certainly look into the queue management code.  However, the queue 
>> itself is empty (we have more executors than needed at the moment).  Jenkins 
>> just believes that jobs that actually finished 5 hours ago are still running.
>
> Smells like a GC issue but I could be wrong.
>
>>
>> - Clark.
>>
>>> So the questions I have are:
>>>
>>> 1.       What is the polling cycle on the node monitoring the job and
>>> is it configurable?
>>
>> Not how the remoting works
>>
>>>
>>> 2.       Is there a way to get more information out of the node than
>>> just pinging systeminfo on the main Jenkins?
>>
>> Yes via the groovy script console
>>
>>>
>>> 3.       Where in the Jenkins code base is the node management code?
>>>
>>
>> Scattered all over, you will want to look into the remoting module, and look 
>> at the Slave and Computer classes.
>>
>> But in reality you probably want to look at how the queue works and not node 
>> management.
>>
>> You might want to investigate the GC cpu time on the slaves and the master.
>>
>>>
>>>
>>>
>>>
>>> This is the thread dump for one of them
>>> (http://jenkins/node1/systeminfo )
>>>
>>> Thread Dump
>>>
>>> Channel reader thread: channel
>>>
>>>
>>>
>>> "Channel reader thread: channel" Id=9 Group=main RUNNABLE (in native)
>>>
>>>                 at java.io.FileInputStream.readBytes(Native Method)
>>>
>>>                 at
>>> java.io.FileInputStream.read(FileInputStream.java:199)
>>>
>>>                 at
>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>>>
>>>                 at
>>> java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>>>
>>>                 -  locked java.io.BufferedInputStream@2486ae
>>>
>>>                 at
>>> java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:
>>> 2249)
>>>
>>>                 at
>>> java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.
>>> java:2542)
>>>
>>>                 at
>>> java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStr
>>> eam.java:2552)
>>>
>>>                 at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
>>>
>>>                 at
>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>>>
>>>                 at
>>> hudson.remoting.Channel$ReaderThread.run(Channel.java:1030)
>>>
>>>
>>>
>>>
>>>
>>> main
>>>
>>>
>>>
>>> "main" Id=1 Group=main WAITING on hudson.remoting.Channel@a17083
>>>
>>>                 at java.lang.Object.wait(Native Method)
>>>
>>>                 -  waiting on hudson.remoting.Channel@a17083
>>>
>>>                 at java.lang.Object.wait(Object.java:485)
>>>
>>>                 at hudson.remoting.Channel.join(Channel.java:766)
>>>
>>>                 at hudson.remoting.Launcher.main(Launcher.java:420)
>>>
>>>                 at
>>> hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366)
>>>
>>>                 at hudson.remoting.Launcher.run(Launcher.java:206)
>>>
>>>                 at hudson.remoting.Launcher.main(Launcher.java:168)
>>>
>>>
>>>
>>>
>>>
>>> Ping thread for channel hudson.remoting.Channel@a17083:channel
>>>
>>>
>>>
>>> "Ping thread for channel hudson.remoting.Channel@a17083:channel" Id=10
>>> Group=main TIMED_WAITING
>>>
>>>                 at java.lang.Thread.sleep(Native Method)
>>>
>>>                 at hudson.remoting.PingThread.run(PingThread.java:86)
>>>
>>>
>>>
>>>
>>>
>>> pool-1-thread-666
>>>
>>>
>>>
>>> "pool-1-thread-666" Id=719 Group=main RUNNABLE
>>>
>>>                 at sun.management.ThreadImpl.dumpThreads0(Native
>>> Method)
>>>
>>>                 at
>>> sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374)
>>>
>>>                 at hudson.Functions.getThreadInfos(Functions.java:872)
>>>
>>>                 at
>>> hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics
>>> .java:93)
>>>
>>>                 at
>>> hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics
>>> .java:89)
>>>
>>>                 at
>>> hudson.remoting.UserRequest.perform(UserRequest.java:118)
>>>
>>>                 at
>>> hudson.remoting.UserRequest.perform(UserRequest.java:48)
>>>
>>>                 at hudson.remoting.Request$2.run(Request.java:287)
>>>
>>>                 at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441
>>> )
>>>
>>>                 at
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>
>>>                 at
>>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>
>>>                 at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu
>>> tor.java:886)
>>>
>>>                 at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:908)
>>>
>>>                 at java.lang.Thread.run(Thread.java:619)
>>>
>>>
>>>
>>>                 Number of locked synchronizers = 1
>>>
>>>                 -
>>> java.util.concurrent.locks.ReentrantLock$NonfairSync@1630de2
>>>
>>>
>>>
>>>
>>>
>>> Finalizer
>>>
>>>
>>>
>>> "Finalizer" Id=3 Group=system WAITING on
>>> java.lang.ref.ReferenceQueue$Lock@64514
>>>
>>>                 at java.lang.Object.wait(Native Method)
>>>
>>>                 -  waiting on java.lang.ref.ReferenceQueue$Lock@64514
>>>
>>>                 at
>>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
>>>
>>>                 at
>>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
>>>
>>>                 at
>>> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>>>
>>>
>>>
>>>
>>>
>>> Reference Handler
>>>
>>>
>>>
>>> "Reference Handler" Id=2 Group=system WAITING on
>>> java.lang.ref.Reference$Lock@1a12930
>>>
>>>                 at java.lang.Object.wait(Native Method)
>>>
>>>                 -  waiting on java.lang.ref.Reference$Lock@1a12930
>>>
>>>                 at java.lang.Object.wait(Object.java:485)
>>>
>>>                 at
>>> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>>>
>>>
>>>
>>>
>>>
>>> Signal Dispatcher
>>>
>>>
>>>
>>> "Signal Dispatcher" Id=4 Group=system RUNNABLE
>>>
>>>
>>>
>>> Thank you,
>>>
>>>
>>>
>>> -Clark.
>>>
>>> The information in this message is for the intended recipient(s) only
>>> and may be the proprietary and/or confidential property of Litle &
>>> Co., LLC, and thus protected from disclosure. If you are not the
>>> intended recipient(s), or an employee or agent responsible for
>>> delivering this message to the intended recipient, you are hereby
>>> notified that any use, dissemination, distribution or copying of this
>>> communication is prohibited. If you have received this communication
>>> in error, please notify Litle & Co. immediately by replying to this
>>> message and then promptly deleting it and your reply permanently from your 
>>> computer.
>>
>> The information in this message is for the intended recipient(s) only and 
>> may be the proprietary and/or confidential property of Litle & Co., LLC, and 
>> thus protected from disclosure. If you are not the intended recipient(s), or 
>> an employee or agent responsible for delivering this message to the intended 
>> recipient, you are hereby notified that any use, dissemination, distribution 
>> or copying of this communication is prohibited. If you have received this 
>> communication in error, please notify Litle & Co. immediately by replying to 
>> this message and then promptly deleting it and your reply permanently from 
>> your computer.

Reply via email to