[ https://issues.jenkins-ci.org/browse/JENKINS-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Randall Schulz updated JENKINS-13330: ------------------------------------- Attachment: jenkins-stall-threaddump.gz A thread dump of the Jenkins master process when there were 8 stalled (slave) jobs. [Apparently I did not succeed last time.] > Jenkins slave hangs in post build phase > --------------------------------------- > > Key: JENKINS-13330 > URL: https://issues.jenkins-ci.org/browse/JENKINS-13330 > Project: Jenkins > Issue Type: Bug > Components: master-slave, slave-status > Environment: RHEL 5, both master and all slaves. > Jenkins is running inside of Tomcat > Reporter: Clark Wright > Priority: Critical > Attachments: jenkins-stall-threaddump.gz, > jenkins-stall-threaddump.gz, Screenshot-galleon_allIntegration #1196 Console > [Jenkins] - Mozilla Firefox.png > > > We have an intermittent problem with slaves hanging AFTER the job itself is > finished. In the post processing step (?) what we see is that the console log > has this line: > Description set: vap_current_iter-2012_03_29_19_01_03 > And then nothing. Usually, it will look like this: > Description set: prod_pull-2012_03_28_19_01_03 > Notifying upstream build armada_Launch_prod_pull #13 of job completion > Project armada_Launch_prod_pull still waiting for 1 builds to complete > Notifying upstream projects of job completion > Notifying upstream of completion: armada_Launch_prod_pull #13 > Finished: SUCCESS > I setup a logger for hudson.model.Run, and it currently has this : > at java.lang.Thread.run(Thread.java:619) > Mar 30, 2012 12:44:00 PM hudson.model.Run run > INFO: galleon_allUnit #1134 main build action completed: SUCCESS > Mar 30, 2012 12:44:00 PM hudson.model.Run setResult > FINE: galleon_allUnit #1134 : result is set to SUCCESS > java.lang.Exception > at hudson.model.Run.setResult(Run.java:352) > at hudson.model.Run.run(Run.java:1410) > at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) > at hudson.model.ResourceController.execute(ResourceController.java:88) > at hudson.model.Executor.run(Executor.java:238) > Repeated for every hung slave. > The main hudson log doesn't have any additional information. > Disconnecting the slave has no effect. > Trying to do an orderly shutdown of Jenkins has no effect (jenkins actually > appears to hang on shutdown). > The only way we have found to recover is to kill -9 the tomcat process. > The tread dump for one of the slaves (they are all the same) is: > Thread Dump > Channel reader thread: channel > "Channel reader thread: channel" Id=9 Group=main RUNNABLE (in native) > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:199) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > - locked java.io.BufferedInputStream@1ae615a > at > java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249) > at > java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542) > at > java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) > at hudson.remoting.Channel$ReaderThread.run(Channel.java:1030) > main > "main" Id=1 Group=main WAITING on hudson.remoting.Channel@e1d5ea > at java.lang.Object.wait(Native Method) > - waiting on hudson.remoting.Channel@e1d5ea > at java.lang.Object.wait(Object.java:485) > at hudson.remoting.Channel.join(Channel.java:766) > at hudson.remoting.Launcher.main(Launcher.java:420) > at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366) > at hudson.remoting.Launcher.run(Launcher.java:206) > at hudson.remoting.Launcher.main(Launcher.java:168) > Ping thread for channel hudson.remoting.Channel@e1d5ea:channel > "Ping thread for channel hudson.remoting.Channel@e1d5ea:channel" Id=10 > Group=main TIMED_WAITING > at java.lang.Thread.sleep(Native Method) > at hudson.remoting.PingThread.run(PingThread.java:86) > Pipe writer thread: channel > "Pipe writer thread: channel" Id=12 Group=main WAITING on > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed > at sun.misc.Unsafe.park(Native Method) > - waiting on > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) > at java.lang.Thread.run(Thread.java:619) > pool-1-thread-267 > "pool-1-thread-267" Id=285 Group=main RUNNABLE > at sun.management.ThreadImpl.dumpThreads0(Native Method) > at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374) > at hudson.Functions.getThreadInfos(Functions.java:872) > at > hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:93) > at > hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:89) > at hudson.remoting.UserRequest.perform(UserRequest.java:118) > at hudson.remoting.UserRequest.perform(UserRequest.java:48) > at hudson.remoting.Request$2.run(Request.java:287) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Number of locked synchronizers = 1 > - java.util.concurrent.locks.ReentrantLock$NonfairSync@1186f88 > Finalizer > "Finalizer" Id=3 Group=system WAITING on > java.lang.ref.ReferenceQueue$Lock@1798fdd > at java.lang.Object.wait(Native Method) > - waiting on java.lang.ref.ReferenceQueue$Lock@1798fdd > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) > at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) > Reference Handler > "Reference Handler" Id=2 Group=system WAITING on > java.lang.ref.Reference$Lock@1d40442 > at java.lang.Object.wait(Native Method) > - waiting on java.lang.ref.Reference$Lock@1d40442 > at java.lang.Object.wait(Object.java:485) > at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) > Signal Dispatcher > "Signal Dispatcher" Id=4 Group=system RUNNABLE > Any ideas on how to better recover or prevent this would be greatly > appreciated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jenkins-ci.org/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira