Change By: Daniel Beck (14/Aug/14 6:00 PM)
Description: I have noticed that whenever I restart my Jenkins master my jnlp slaves are not reconnecting and require a manual slave restart to bring them back online.

I've traced this back to the changes to fix JENKINS-19055. Specifically those changes cause the slave JVM to be restarted when the master disconnects. Prior to this change the remoting engine would wait for the server to restart before attempting to reconnect to the master. With the change it immediately tries to connect to the master and get a connection error because the master is restarting. This causes the slave to immediately terminate.

Jenkins 1.575 gives the following slave log output when restarting the master

{noformat}
Aug 12, 2014 3:55:15 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Aug 12, 2014 3:55:15 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onDisconnect
INFO: Restarting slave via jenkins.slaves.restarter.UnixSlaveRestarter@32a9f661
Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: bishop
Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins.example/]
Aug 12, 2014 3:55:18 PM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: http://jenkins.example/tcpSlaveAgentListener/ is invalid: 503 Service Temporarily Unavailable
java.lang.Exception: http://jenkins.example/tcpSlaveAgentListener/ is invalid: 503 Service Temporarily Unavailable
        at hudson.remoting.Engine.run(Engine.java:213)
{noformat}

Notice the "jenkins.slaves.restarter.JnlpSlaveRestarterInstaller" onDisconnect log message that performs a slave restart.

Prior to JENKINS-19055 being integrated the slave called waitForServerToBack() repeatedly until the master came back online. For example

{noformat}
25-Mar-2014 10:52:16 hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
25-Mar-2014 10:52:26 hudson.remoting.Engine waitForServerToBack
INFO: Failed to connect to the master. Will retry again
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385)
        at java.net.Socket.connect(Socket.java:546)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:173)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:409)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
        at sun.net.www.http.HttpClient.<init>(HttpClient.java:240)
        at sun.net.www.http.HttpClient.New(HttpClient.java:321)
        at sun.net.www.http.HttpClient.New(HttpClient.java:338)
        at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:935)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876)
        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:801)
        at hudson.remoting.Engine.waitForServerToBack(Engine.java:371)
        at hudson.remoting.Engine.run(Engine.java:278)
...
25-Mar-2014 10:54:11 hudson.remoting.Engine waitForServerToBack
INFO: Master isn't ready to talk to us. Will retry again: response code=503
25-Mar-2014 10:54:21 hudson.remoting.Engine waitForServerToBack
INFO: Master isn't ready to talk to us. Will retry again: response code=503
25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins.example/]
25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins.example:42715
25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
25-Mar-2014 10:54:32 hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
{noformat}

The connection/retry logic is contained in remoting Engine.java
https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java

When connecting to the slave an error causes the connection to terminate (around line 232)

{code}
if(firstError!=null) {
  events.error(firstError);
  return;
}
{code}

prior to JENKINS-19055 hooking into onDisconnect() a re-connection would not be attempted until waitForServerToBack() had ensured that the master had recovered.

{code}
events.onDisconnect();
// try to connect back to the server every 10 secs.
waitForServerToBack();
{code}

A quick and dirty fix would likely be to swap the onDisconnect and waitForServerToBack calls around.
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira

--
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to