Another thing you could look into is forked child processes having captured stdout / stderr.
The process will not be seen as finished until all stdout/stderr has been captured, so if your build leaves a non-daemon process hanging around, that could be the RCA On 9 May 2012 16:53, Stephen Connolly <stephen.alan.conno...@gmail.com> wrote: > On 9 May 2012 16:31, Wright, Clark <cwri...@litle.com> wrote: >> Thank you. >> >> So how does remoting work with respect to end of job notification? >> >> My initial assumption was that it was simply waiting for the forked process >> to finish, grab the resultant return code, and update the master. >> > > well you could look at it like that, in actuality the better way to > look at it is as more or a distributed jvm. The master sends a closure > to the slave, the closure forks the child process and when the child > process completes the closure should return the result to the master. > >> Also, any pointers/suggestions as to what information I need/want to get out >> of the groovy script console? >> >> Will certainly look into the queue management code. However, the queue >> itself is empty (we have more executors than needed at the moment). Jenkins >> just believes that jobs that actually finished 5 hours ago are still running. > > Smells like a GC issue but I could be wrong. > >> >> - Clark. >> >>> So the questions I have are: >>> >>> 1. What is the polling cycle on the node monitoring the job and >>> is it configurable? >> >> Not how the remoting works >> >>> >>> 2. Is there a way to get more information out of the node than >>> just pinging systeminfo on the main Jenkins? >> >> Yes via the groovy script console >> >>> >>> 3. Where in the Jenkins code base is the node management code? >>> >> >> Scattered all over, you will want to look into the remoting module, and look >> at the Slave and Computer classes. >> >> But in reality you probably want to look at how the queue works and not node >> management. >> >> You might want to investigate the GC cpu time on the slaves and the master. >> >>> >>> >>> >>> >>> This is the thread dump for one of them >>> (http://jenkins/node1/systeminfo ) >>> >>> Thread Dump >>> >>> Channel reader thread: channel >>> >>> >>> >>> "Channel reader thread: channel" Id=9 Group=main RUNNABLE (in native) >>> >>> at java.io.FileInputStream.readBytes(Native Method) >>> >>> at >>> java.io.FileInputStream.read(FileInputStream.java:199) >>> >>> at >>> java.io.BufferedInputStream.fill(BufferedInputStream.java:218) >>> >>> at >>> java.io.BufferedInputStream.read(BufferedInputStream.java:237) >>> >>> - locked java.io.BufferedInputStream@2486ae >>> >>> at >>> java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java: >>> 2249) >>> >>> at >>> java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream. >>> java:2542) >>> >>> at >>> java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStr >>> eam.java:2552) >>> >>> at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) >>> >>> at >>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) >>> >>> at >>> hudson.remoting.Channel$ReaderThread.run(Channel.java:1030) >>> >>> >>> >>> >>> >>> main >>> >>> >>> >>> "main" Id=1 Group=main WAITING on hudson.remoting.Channel@a17083 >>> >>> at java.lang.Object.wait(Native Method) >>> >>> - waiting on hudson.remoting.Channel@a17083 >>> >>> at java.lang.Object.wait(Object.java:485) >>> >>> at hudson.remoting.Channel.join(Channel.java:766) >>> >>> at hudson.remoting.Launcher.main(Launcher.java:420) >>> >>> at >>> hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366) >>> >>> at hudson.remoting.Launcher.run(Launcher.java:206) >>> >>> at hudson.remoting.Launcher.main(Launcher.java:168) >>> >>> >>> >>> >>> >>> Ping thread for channel hudson.remoting.Channel@a17083:channel >>> >>> >>> >>> "Ping thread for channel hudson.remoting.Channel@a17083:channel" Id=10 >>> Group=main TIMED_WAITING >>> >>> at java.lang.Thread.sleep(Native Method) >>> >>> at hudson.remoting.PingThread.run(PingThread.java:86) >>> >>> >>> >>> >>> >>> pool-1-thread-666 >>> >>> >>> >>> "pool-1-thread-666" Id=719 Group=main RUNNABLE >>> >>> at sun.management.ThreadImpl.dumpThreads0(Native >>> Method) >>> >>> at >>> sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374) >>> >>> at hudson.Functions.getThreadInfos(Functions.java:872) >>> >>> at >>> hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics >>> .java:93) >>> >>> at >>> hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics >>> .java:89) >>> >>> at >>> hudson.remoting.UserRequest.perform(UserRequest.java:118) >>> >>> at >>> hudson.remoting.UserRequest.perform(UserRequest.java:48) >>> >>> at hudson.remoting.Request$2.run(Request.java:287) >>> >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441 >>> ) >>> >>> at >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>> >>> at >>> java.util.concurrent.FutureTask.run(FutureTask.java:138) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu >>> tor.java:886) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. >>> java:908) >>> >>> at java.lang.Thread.run(Thread.java:619) >>> >>> >>> >>> Number of locked synchronizers = 1 >>> >>> - >>> java.util.concurrent.locks.ReentrantLock$NonfairSync@1630de2 >>> >>> >>> >>> >>> >>> Finalizer >>> >>> >>> >>> "Finalizer" Id=3 Group=system WAITING on >>> java.lang.ref.ReferenceQueue$Lock@64514 >>> >>> at java.lang.Object.wait(Native Method) >>> >>> - waiting on java.lang.ref.ReferenceQueue$Lock@64514 >>> >>> at >>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) >>> >>> at >>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) >>> >>> at >>> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) >>> >>> >>> >>> >>> >>> Reference Handler >>> >>> >>> >>> "Reference Handler" Id=2 Group=system WAITING on >>> java.lang.ref.Reference$Lock@1a12930 >>> >>> at java.lang.Object.wait(Native Method) >>> >>> - waiting on java.lang.ref.Reference$Lock@1a12930 >>> >>> at java.lang.Object.wait(Object.java:485) >>> >>> at >>> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) >>> >>> >>> >>> >>> >>> Signal Dispatcher >>> >>> >>> >>> "Signal Dispatcher" Id=4 Group=system RUNNABLE >>> >>> >>> >>> Thank you, >>> >>> >>> >>> -Clark. >>> >>> The information in this message is for the intended recipient(s) only >>> and may be the proprietary and/or confidential property of Litle & >>> Co., LLC, and thus protected from disclosure. If you are not the >>> intended recipient(s), or an employee or agent responsible for >>> delivering this message to the intended recipient, you are hereby >>> notified that any use, dissemination, distribution or copying of this >>> communication is prohibited. If you have received this communication >>> in error, please notify Litle & Co. immediately by replying to this >>> message and then promptly deleting it and your reply permanently from your >>> computer. >> >> The information in this message is for the intended recipient(s) only and >> may be the proprietary and/or confidential property of Litle & Co., LLC, and >> thus protected from disclosure. If you are not the intended recipient(s), or >> an employee or agent responsible for delivering this message to the intended >> recipient, you are hereby notified that any use, dissemination, distribution >> or copying of this communication is prohibited. If you have received this >> communication in error, please notify Litle & Co. immediately by replying to >> this message and then promptly deleting it and your reply permanently from >> your computer.