I've been having problems where some of my builds appear to hang between 
finishing build actions and running the publisher.  Server and client are 
running on Java 1.7 on Red Hat Linux.  During these hangs, the slave node being 
"hung" isn't running anything, nor has file descriptors open to jobs it had 
launched (there is literally nothing but the slave node running on that given 
Linux account).

Per the thread dump, I get:

Executor #0 for build10_armada_5 : executing galleon_allIntegration #13982 : 
waiting for Check point hudson.plugins.templateproject.ProxyPublisher on 
galleon_allIntegration #13981
"Executor #0 for build10_armada_5 : executing galleon_allIntegration #13982 : 
waiting for Check point hudson.plugins.templateproject.ProxyPublisher on 
galleon_allIntegration #13981" Id=376 Group=main WAITING on 
hudson.model.Run$RunExecution$CheckpointSet@22d65fc7
        at java.lang.Object.wait(Native Method)
        -  waiting on hudson.model.Run$RunExecution$CheckpointSet@22d65fc7
        at java.lang.Object.wait(Object.java:503)
        at 
hudson.model.Run$RunExecution$CheckpointSet.waitForCheckPoint(Run.java:1363)
        at hudson.model.Run.waitForCheckpoint(Run.java:1321)
        at hudson.model.CheckPoint.block(CheckPoint.java:144)
        at hudson.tasks.BuildStepMonitor$2.perform(BuildStepMonitor.java:25)
        at 
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:717)
        at 
hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:692)
        at hudson.model.Build$BuildExecution.post2(Build.java:183)
        at 
hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:639)
        at hudson.model.Run.execute(Run.java:1527)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:236)



Basically, this galleon_allIntegration job is a multi-hour integration test 
suite that gets run some 60-70 times a night on different branches of the same 
source tree.  When it gets hung like this, it's specifically after running all 
the build steps and before running a publisher taken from the template project 
plugin (these take a build action and a publisher from another project).  Per 
the CheckPoint class (which I don't understand due to my own ignorance), build 
#13982 is intentionally waiting for #13981 to reach some sort of checkpoint.  
As far as I'm concerned, these jobs are on different source branches and should 
be independent, not waiting for each other.

Is there any way to convince Jenkins that these different runs of 
galleon_allIntegration are independent so that they don't block on each other? 
Is launching 60-70 simultaneous multi-hour runs of the same project not 
particularly Jenkins-friendly?

Thanks in advance,

--Rob Mandeville
Litle & Co (part of the Vantiv family)

The information in this message is for the intended recipient(s) only and may 
be the proprietary and/or confidential property of Litle & Co., LLC, and thus 
protected from disclosure. If you are not the intended recipient(s), or an 
employee or agent responsible for delivering this message to the intended 
recipient, you are hereby notified that any use, dissemination, distribution or 
copying of this communication is prohibited. If you have received this 
communication in error, please notify Litle & Co. immediately by replying to 
this message and then promptly deleting it and your reply permanently from your 
computer.

Reply via email to