![]() |
|
|
Issue Type:
|
Bug
|
Assignee:
|
Kohsuke Kawaguchi
|
Components:
|
core |
Created:
|
14/Aug/14 10:04 AM
|
Description:
|
I have noticed that whenever I restart my Jenkins master my jnlp slaves are not reconnecting and require a manual slave restart to bring them back online.
I've traced this back to the changes to fix JENKINS-19055. Specifically those changes cause the slave JVM to be restarted when the master disconnects. Prior to this change the remoting engine would wait for the server to restart before attempting to reconnect to the master. With the change it immediately tries to connect to the master and get a connection error because the master is restarting. This causes the slave to immediately terminate.
Jenkins 1.575 gives the following slave log output when restarting the master
Notice the "jenkins.slaves.restarter.JnlpSlaveRestarterInstaller" onDisconnect log message that performs a slave restart.
Prior to JENKINS-19055 being integrated the slave called waitForServerToBack() repeatedly until the master came back online. For example
25-Mar-2014 10:52:16 hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
25-Mar-2014 10:52:26 hudson.remoting.Engine waitForServerToBack
INFO: Failed to connect to the master. Will retry again
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385)
at java.net.Socket.connect(Socket.java:546)
at sun.net.NetworkClient.doConnect(NetworkClient.java:173)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:409)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:240)
at sun.net.www.http.HttpClient.New(HttpClient.java:321)
at sun.net.www.http.HttpClient.New(HttpClient.java:338)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:935)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:801)
at hudson.remoting.Engine.waitForServerToBack(Engine.java:371)
at hudson.remoting.Engine.run(Engine.java:278)
...
25-Mar-2014 10:54:11 hudson.remoting.Engine waitForServerToBack
INFO: Master isn't ready to talk to us. Will retry again: response code=503
25-Mar-2014 10:54:21 hudson.remoting.Engine waitForServerToBack
INFO: Master isn't ready to talk to us. Will retry again: response code=503
25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among http://jenkins.example/
25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins.example:42715
25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
25-Mar-2014 10:54:32 hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
The connection/retry logic is contained in remoting Engine.java
https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java
When connecting to the slave an error causes the connection to terminate (around line 232)
if(firstError!=null) {
events.error(firstError);
return;
}
prior to JENKINS-19055 hooking into onDisconnect() a re-connection would not be attempted until waitForServerToBack() had ensured that the master had recovered.
events.onDisconnect();
waitForServerToBack();
A quick and dirty fix would likely be to swap the onDisconnect and waitForServerToBack calls around.
|
Environment:
|
jenkins 1.559 and above
|
Project:
|
Jenkins
|
Priority:
|
Critical
|
Reporter:
|
Richard Mortimer
|
|
|
|
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira
|
--
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit
https://groups.google.com/d/optout.