I've been having problems where some of my builds appear to hang between finishing build actions and running the publisher. Server and client are running on Java 1.7 on Red Hat Linux. During these hangs, the slave node being "hung" isn't running anything, nor has file descriptors open to jobs it had launched (there is literally nothing but the slave node running on that given Linux account).
Per the thread dump, I get: Executor #0 for build10_armada_5 : executing galleon_allIntegration #13982 : waiting for Check point hudson.plugins.templateproject.ProxyPublisher on galleon_allIntegration #13981 "Executor #0 for build10_armada_5 : executing galleon_allIntegration #13982 : waiting for Check point hudson.plugins.templateproject.ProxyPublisher on galleon_allIntegration #13981" Id=376 Group=main WAITING on hudson.model.Run$RunExecution$CheckpointSet@22d65fc7 at java.lang.Object.wait(Native Method) - waiting on hudson.model.Run$RunExecution$CheckpointSet@22d65fc7 at java.lang.Object.wait(Object.java:503) at hudson.model.Run$RunExecution$CheckpointSet.waitForCheckPoint(Run.java:1363) at hudson.model.Run.waitForCheckpoint(Run.java:1321) at hudson.model.CheckPoint.block(CheckPoint.java:144) at hudson.tasks.BuildStepMonitor$2.perform(BuildStepMonitor.java:25) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:717) at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:692) at hudson.model.Build$BuildExecution.post2(Build.java:183) at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:639) at hudson.model.Run.execute(Run.java:1527) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236) Basically, this galleon_allIntegration job is a multi-hour integration test suite that gets run some 60-70 times a night on different branches of the same source tree. When it gets hung like this, it's specifically after running all the build steps and before running a publisher taken from the template project plugin (these take a build action and a publisher from another project). Per the CheckPoint class (which I don't understand due to my own ignorance), build #13982 is intentionally waiting for #13981 to reach some sort of checkpoint. As far as I'm concerned, these jobs are on different source branches and should be independent, not waiting for each other. Is there any way to convince Jenkins that these different runs of galleon_allIntegration are independent so that they don't block on each other? Is launching 60-70 simultaneous multi-hour runs of the same project not particularly Jenkins-friendly? Thanks in advance, --Rob Mandeville Litle & Co (part of the Vantiv family) The information in this message is for the intended recipient(s) only and may be the proprietary and/or confidential property of Litle & Co., LLC, and thus protected from disclosure. If you are not the intended recipient(s), or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication is prohibited. If you have received this communication in error, please notify Litle & Co. immediately by replying to this message and then promptly deleting it and your reply permanently from your computer.