Could you expand on the handling (or expectations) of stdout/stderr by the 
slave?  

All the processes associated with the build have finished and exited, the only 
process left running on that account is the jvm running the  slave.

Now, parts of our test harness do muck with stdout/stderr, but to the best of 
our knowledge, those two are always reset correctly.

Also, forcing a gc on the slave (that is where you think the GC problem is, 
right?) doesn't change anything.

Thanks,
-clark.
-----Original Message-----
From: jenkinsci-users@googlegroups.com 
[mailto:jenkinsci-users@googlegroups.com] On Behalf Of Stephen Connolly
Sent: Wednesday, May 09, 2012 11:55 AM
To: jenkinsci-users@googlegroups.com
Subject: Re: Delay between job finished and node finished on unix

Another thing you could look into is forked child processes having captured 
stdout / stderr.

The process will not be seen as finished until all stdout/stderr has been 
captured, so if your build leaves a non-daemon process hanging around, that 
could be the RCA

On 9 May 2012 16:53, Stephen Connolly <stephen.alan.conno...@gmail.com> wrote:
> On 9 May 2012 16:31, Wright, Clark <cwri...@litle.com> wrote:
>> Thank you.
>>
>> So how does remoting work with respect to end of job notification?
>>
>> My initial assumption was that it was simply waiting for the forked process 
>> to finish, grab the resultant return code, and update the master.
>>
>
> well you could look at it like that, in actuality the better way to 
> look at it is as more or a distributed jvm. The master sends a closure 
> to the slave, the closure forks the child process and when the child 
> process completes the closure should return the result to the master.
>
>> Also, any pointers/suggestions as to what information I need/want to get out 
>> of the groovy script console?
>>
>> Will certainly look into the queue management code.  However, the queue 
>> itself is empty (we have more executors than needed at the moment).  Jenkins 
>> just believes that jobs that actually finished 5 hours ago are still running.
>
> Smells like a GC issue but I could be wrong.
>
>>
>> - Clark.
>>
>>> So the questions I have are:
>>>
>>> 1.       What is the polling cycle on the node monitoring the job 
>>> and is it configurable?
>>
>> Not how the remoting works
>>
>>>
>>> 2.       Is there a way to get more information out of the node than 
>>> just pinging systeminfo on the main Jenkins?
>>
>> Yes via the groovy script console
>>
>>>
>>> 3.       Where in the Jenkins code base is the node management code?
>>>
>>
>> Scattered all over, you will want to look into the remoting module, and look 
>> at the Slave and Computer classes.
>>
>> But in reality you probably want to look at how the queue works and not node 
>> management.
>>
>> You might want to investigate the GC cpu time on the slaves and the master.
>>
>>>
>>>
>>>
>>>
>>> This is the thread dump for one of them 
>>> (http://jenkins/node1/systeminfo )
>>>
>>> Thread Dump
>>>
>>> Channel reader thread: channel
>>>
>>>
>>>
>>> "Channel reader thread: channel" Id=9 Group=main RUNNABLE (in 
>>> native)
>>>
>>>                 at java.io.FileInputStream.readBytes(Native Method)
>>>
>>>                 at
>>> java.io.FileInputStream.read(FileInputStream.java:199)
>>>
>>>                 at
>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>>>
>>>                 at
>>> java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>>>
>>>                 -  locked java.io.BufferedInputStream@2486ae
>>>
>>>                 at
>>> java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:
>>> 2249)
>>>
>>>                 at
>>> java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.
>>> java:2542)
>>>
>>>                 at
>>> java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputS
>>> tr
>>> eam.java:2552)
>>>
>>>                 at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
>>>
>>>                 at
>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>>>
>>>                 at
>>> hudson.remoting.Channel$ReaderThread.run(Channel.java:1030)
>>>
>>>
>>>
>>>
>>>
>>> main
>>>
>>>
>>>
>>> "main" Id=1 Group=main WAITING on hudson.remoting.Channel@a17083
>>>
>>>                 at java.lang.Object.wait(Native Method)
>>>
>>>                 -  waiting on hudson.remoting.Channel@a17083
>>>
>>>                 at java.lang.Object.wait(Object.java:485)
>>>
>>>                 at hudson.remoting.Channel.join(Channel.java:766)
>>>
>>>                 at hudson.remoting.Launcher.main(Launcher.java:420)
>>>
>>>                 at
>>> hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366)
>>>
>>>                 at hudson.remoting.Launcher.run(Launcher.java:206)
>>>
>>>                 at hudson.remoting.Launcher.main(Launcher.java:168)
>>>
>>>
>>>
>>>
>>>
>>> Ping thread for channel hudson.remoting.Channel@a17083:channel
>>>
>>>
>>>
>>> "Ping thread for channel hudson.remoting.Channel@a17083:channel" 
>>> Id=10 Group=main TIMED_WAITING
>>>
>>>                 at java.lang.Thread.sleep(Native Method)
>>>
>>>                 at 
>>> hudson.remoting.PingThread.run(PingThread.java:86)
>>>
>>>
>>>
>>>
>>>
>>> pool-1-thread-666
>>>
>>>
>>>
>>> "pool-1-thread-666" Id=719 Group=main RUNNABLE
>>>
>>>                 at sun.management.ThreadImpl.dumpThreads0(Native
>>> Method)
>>>
>>>                 at
>>> sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374)
>>>
>>>                 at 
>>> hudson.Functions.getThreadInfos(Functions.java:872)
>>>
>>>                 at
>>> hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnosti
>>> cs
>>> .java:93)
>>>
>>>                 at
>>> hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnosti
>>> cs
>>> .java:89)
>>>
>>>                 at
>>> hudson.remoting.UserRequest.perform(UserRequest.java:118)
>>>
>>>                 at
>>> hudson.remoting.UserRequest.perform(UserRequest.java:48)
>>>
>>>                 at hudson.remoting.Request$2.run(Request.java:287)
>>>
>>>                 at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>> 41
>>> )
>>>
>>>                 at
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>
>>>                 at
>>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>
>>>                 at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExe
>>> cu
>>> tor.java:886)
>>>
>>>                 at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:908)
>>>
>>>                 at java.lang.Thread.run(Thread.java:619)
>>>
>>>
>>>
>>>                 Number of locked synchronizers = 1
>>>
>>>                 -
>>> java.util.concurrent.locks.ReentrantLock$NonfairSync@1630de2
>>>
>>>
>>>
>>>
>>>
>>> Finalizer
>>>
>>>
>>>
>>> "Finalizer" Id=3 Group=system WAITING on
>>> java.lang.ref.ReferenceQueue$Lock@64514
>>>
>>>                 at java.lang.Object.wait(Native Method)
>>>
>>>                 -  waiting on 
>>> java.lang.ref.ReferenceQueue$Lock@64514
>>>
>>>                 at
>>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
>>>
>>>                 at
>>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
>>>
>>>                 at
>>> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>>>
>>>
>>>
>>>
>>>
>>> Reference Handler
>>>
>>>
>>>
>>> "Reference Handler" Id=2 Group=system WAITING on
>>> java.lang.ref.Reference$Lock@1a12930
>>>
>>>                 at java.lang.Object.wait(Native Method)
>>>
>>>                 -  waiting on java.lang.ref.Reference$Lock@1a12930
>>>
>>>                 at java.lang.Object.wait(Object.java:485)
>>>
>>>                 at
>>> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>>>
>>>
>>>
>>>
>>>
>>> Signal Dispatcher
>>>
>>>
>>>
>>> "Signal Dispatcher" Id=4 Group=system RUNNABLE
>>>
>>>
>>>
>>> Thank you,
>>>
>>>
>>>
>>> -Clark.
>>>
>>> The information in this message is for the intended recipient(s) 
>>> only and may be the proprietary and/or confidential property of 
>>> Litle & Co., LLC, and thus protected from disclosure. If you are not 
>>> the intended recipient(s), or an employee or agent responsible for 
>>> delivering this message to the intended recipient, you are hereby 
>>> notified that any use, dissemination, distribution or copying of 
>>> this communication is prohibited. If you have received this 
>>> communication in error, please notify Litle & Co. immediately by 
>>> replying to this message and then promptly deleting it and your reply 
>>> permanently from your computer.
>>
>> The information in this message is for the intended recipient(s) only and 
>> may be the proprietary and/or confidential property of Litle & Co., LLC, and 
>> thus protected from disclosure. If you are not the intended recipient(s), or 
>> an employee or agent responsible for delivering this message to the intended 
>> recipient, you are hereby notified that any use, dissemination, distribution 
>> or copying of this communication is prohibited. If you have received this 
>> communication in error, please notify Litle & Co. immediately by replying to 
>> this message and then promptly deleting it and your reply permanently from 
>> your computer.

Reply via email to