Re: remoting issue

2019-03-13 Thread kuisathaverat
Did you check if after the disconnection there is a remoting.jar process running on the agents? you can try to fix the memory of the remoting process by setting the JVM Options `--Xms=512m --Xmx=512m` on the node configuration, I am not sure if it is crashing or not. El mar., 12 mar. 2019 a las 14

Re: remoting issue

2019-03-12 Thread Glenn Burkhardt
We have another instance of WARNING: Failed to ack the stream java.io.IOException: Broken pipe and scatter/gather is disabled: /etc/network/if-up.d$ ethtool -k ens3 Features for ens3: Cannot get device udp-fragmentation-offload settings: Operation not permitted rx-checksumming: on [fixed] tx-ch

Re: remoting issue

2019-02-18 Thread Iván Fernández Calvo
I dunno if it is your case, but things like VMotion that changes the VM to a less loaded rack, energy saving, any kind of power on network, and in general every that suspends the VM will break the connection, Jenkins check on agents every 4 min, if the agent does not respond on time Jenkins brea

Re: remoting issue

2019-02-18 Thread Glenn Burkhardt
I have to agree that the Java version is a red herring. Our VMs have been more stable in the last few days. The changes that I think might have had an effect are: a) Increasing the connection timeout to 6 b) Turning off scatter/gather But I also upgraded them all with the most recent Ubun

Re: remoting issue

2019-02-11 Thread kuisathaverat
I think you can discard JVM versions issues and connectivity issues, because the agent connects and copy the remoting.jar, How much time takes between the agent connect and the channel is broken? Check you do not have any configuration that kill idle connections, also try to assign memory to the r

Re: remoting issue

2019-02-11 Thread niristotle okram
Okay going by the logs now, here are some of the possible things that you can look into: 1. jdk versions consistent across the master and slaves 2. https://issues.jenkins-ci.org/plugins/servlet/mobile#issue/JENKINS-30561 3. https://devops.stackexchange.com/questions/1053/jenkins-cannot-reach-nodes

Re: remoting issue

2019-02-11 Thread Glenn Burkhardt
I've attached the two logs. I did make a change after seeing 'Failed to ack the stream' to turn off scatter/gather. But I think that's a bit of a "hail mary". On Monday, February 11, 2019 at 10:16:05 AM UTC-5, Ivan Fernandez Calvo wrote: > > It could be tons of things, Probably, if you go to

Re: remoting issue

2019-02-11 Thread niristotle okram
Libvirt 1.8.6 >> Slave guest: Ubuntu 18.04.1 >> >> Jenkins and the VMs are all running on the same machine, so network >> activity shouldn't be an issue. >> >> I've been looking at the wiki note here: >> https://wiki.jenkins.io/display/JENKI

Re: remoting issue

2019-02-11 Thread Ivan Fernandez Calvo
nd the VMs are all running on the same machine, so network > activity shouldn't be an issue. > > I've been looking at the wiki note here: > https://wiki.jenkins.io/display/JENKINS/Remoting+issue > > and the anomaly I've noticed is repeated in the slave.log file crea

Re: remoting issue

2019-02-11 Thread Glenn Burkhardt
I see now that the value for "kexTimeout" should be the 210 value you reference. BTW, we are using only 5 agents. I found reports of a similar problem with connection timeouts here: https://github.com/jenkinsci/ec2-fleet-plugin/issues/41 and as an experiment, followed the recommendation of

Re: remoting issue

2019-02-06 Thread Iván Fernández Calvo
This timeout, it is only for the connection stage, and it includes whole retry reconnections, long history, the default value is 210 seconds, less than 30-60 seconds it is not a good value and only if you have reties to 0. I do not know how many agents you spin at the same time, I would try to

Re: remoting issue

2019-02-06 Thread Glenn Burkhardt
My reading of the code indicates that the timeout value is set by "kexTimeout" in com\trilead\ssh2\Connection.java at line 693. That appears to be set in SSHLauncher.openConnection():1184. The value we're using for 'launchTimeoutMillis' should be 15000, assuming that it comes from "Startup Id

Re: remoting issue

2019-02-06 Thread Ivan Fernandez Calvo
>Jenkins and the VMs are all running on the same machine, so network activity shouldn't be an issue. network is not an issue, but performance response, Could it be? it is not a good idea to run the Jenkins Master and Agents in the same machine, if you use Docker container and you do not limit t

remoting issue

2019-02-06 Thread Glenn Burkhardt
twork activity shouldn't be an issue. I've been looking at the wiki note here: https://wiki.jenkins.io/display/JENKINS/Remoting+issue and the anomaly I've noticed is repeated in the slave.log file created by Jenkins (SocketTimeoutException): Feb

Re: Remoting issue

2019-02-06 Thread Glenn Burkhardt
...forgot to mention that Jenkins and the VMs are all running on the same machine, so network activity shouldn't be an issue. -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, sen

Remoting issue

2019-02-06 Thread Glenn Burkhardt
ote here: https://wiki.jenkins.io/display/JENKINS/Remoting+issue and the anomaly I've noticed is repeated in the slave.log file created by Jenkins (SocketTimeoutException): Feb 06, 2019 8:37:58 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /home/jenkins/remoti