|
||||||||
This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators. For more information on JIRA, see: http://www.atlassian.com/software/jira |
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
This is starting to become a blocker. It is happening regularly (2-3 times a week) and forcing a restart of Jenkins. I have updated to Jenkins version 1.554.3 with no improvement.
I think I managed to catch the problem at a slightly earlier moment this time also. After a restart of Jenkins, a number of hours later a job was stuck on one of the build slaves and was aborted. When logging into the machine there was no sign of the Jenkins slave Java process running. However, in the Jenkins GUI the slave appeared to be running or at least starting. About 10 minutes before the stuck job another job ran and failed with the build log following messages:
____________________________________________________________________________
Started by upstream project "admin-validate-slave-configs" build number 456
originally caused by:
Started by timer
[EnvInject] - Loading node environment variables.
[EnvInject] - [ERROR] - SEVERE ERROR occurs: java.lang.InterruptedException
Deleting project workspace...
Collecting metadata...
Metadata collection done.
Finished: FAILURE
____________________________________________________________________________
The thread stack dumps that I believe are relevant:
____________________________________________________________________________
"Channel reader thread: ma016213" Id=23008 Group=main WAITING on com.trilead.ssh2.channel.Channel@1d6e8511
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at com.trilead.ssh2.channel.FifoBuffer.read(FifoBuffer.java:212)
at com.trilead.ssh2.channel.Channel$Output.read(Channel.java:127)
at com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:946)
at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58)
at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79)
at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:77)
at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2293)
at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2586)
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500)
at java.lang.Throwable.readObject(Throwable.java:914)
at sun.reflect.GeneratedMethodAccessor201.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at hudson.remoting.Command.readFrom(Command.java:92)
at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
"Executor #0 for ma016213 : executing admin-validate-slave-configs ยป ma016213 #456 / waiting for hudson.remoting.Channel@21fb7f0a:ma016213" Id=254 Group=main TIMED_WAITING on hudson.remoting.UserRequest@e493bde
at java.lang.Object.wait(Native Method)
at hudson.remoting.Request.call(Request.java:146)
at hudson.remoting.Channel.call(Channel.java:722)
at hudson.FilePath.act(FilePath.java:1003)
at org.jenkinsci.plugins.envinject.service.EnvironmentVariablesNodeLoader.gatherEnvironmentVariablesNode(EnvironmentVariablesNodeLoader.java:44)
at org.jenkinsci.plugins.envinject.EnvInjectListener.loadEnvironmentVariablesNode(EnvInjectListener.java:81)
at org.jenkinsci.plugins.envinject.EnvInjectListener.setUpEnvironment(EnvInjectListener.java:39)
at hudson.model.AbstractBuild$AbstractBuildExecution.createLauncher(AbstractBuild.java:637)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:543)
at hudson.model.Run.execute(Run.java:1684)
at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:231)
"pool-48-thread-1 / waiting for hudson.remoting.Channel@21fb7f0a:ma016213" Id=22727 Group=main TIMED_WAITING on hudson.remoting.UserRequest@14c4e0d3
at java.lang.Object.wait(Native Method)
at hudson.remoting.Request.call(Request.java:146)
at hudson.remoting.Channel.call(Channel.java:722)
at org.jenkinsci.modules.slave_installer.impl.ComputerListenerImpl.onOnline(ComputerListenerImpl.java:32)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:503)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:345)
at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:901)
at hudson.plugins.sshslaves.SSHLauncher.access$400(SSHLauncher.java:126)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:658)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:642)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Number of locked synchronizers = 1
____________________________________________________________________________
What is interesting is that the job indicated in the stack dump for "Executor #0" is the one that failed earlier, before the job that actually got stuck was run.
For your info: admin-validate-slave-configs is a matrix job that simply runs a python script on each slave every morning to check that all paths, tools and environment variables are setup correctly.
The following is a dump of the slave's log:
____________________________________________________________________________
[07/30/14 06:11:30] [SSH] Opening SSH connection to ma016213:22.
[07/30/14 06:11:31] [SSH] Authentication successful.
[07/30/14 06:11:32] [SSH] The remote users environment is:
BASH=/bin/bash
BASH_ARGC=()
BASH_ARGV=()
BASH_EXECUTION_STRING=set
BASH_LINENO=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="3" [1]="2" [2]="51" [3]="1" [4]="release" [5]="x86_64-apple-darwin13")
BASH_VERSION='3.2.51(1)-release'
DIRSTACK=()
EUID=502
GROUPS=()
HOME=/Users/buildacc
HOSTNAME=ma016213
HOSTTYPE=x86_64
IFS=$' \t\n'
LOGNAME=buildacc
MACHTYPE=x86_64-apple-darwin13
MAIL=/var/mail/buildacc
OPTERR=1
OPTIND=1
OSTYPE=darwin13
PATH=/usr/bin:/bin:/usr/sbin:/sbin
PPID=30224
PS4='+ '
PWD=/Users/buildacc
SHELL=/bin/bash
SHELLOPTS=braceexpand:hashall:interactive-comments
SHLVL=1
SSH_CLIENT='**.*.*.** 50759 22'
SSH_CONNECTION='**.*.*.** 50759 **.*.*.** 22'
TERM=dumb
TMPDIR=/var/folders/07/53xr552x5yq4dsbjnlkh0l4m0000gp/T/
UID=502
USER=buildacc
_=bash
[07/30/14 06:11:32] [SSH] Checking java version of java
[07/30/14 06:11:34] [SSH] java -version returned 1.7.0_55.
[07/30/14 06:11:34] [SSH] Starting sftp client.
[07/30/14 06:11:34] [SSH] Copying latest slave.jar...
[07/30/14 06:11:34] [SSH] Copied 364,754 bytes.
Expanded the channel window size to 4MB
[07/30/14 06:11:34] [SSH] Starting slave process: cd "/Users/buildacc/jenkinsBuild" && java -Djava.awt.headless=true -jar slave.jar
<===[JENKINS REMOTING CAPACITY]===>@@^@channel started
Slave.jar version: 2.36
This is a Unix slave
Evacuated stdout
____________________________________________________________________________