I ran jstack on Jenkins, and many of the threads had state BLOCKED. However after a restart most of the threads are BLOCKED. Not sure if it is an issue here.
After a restart Jenkins starts with aprox 200 threads open. When I got problem with disconnected agents, the thread count reached 500. onsdag 17. juli 2019 12.40.14 UTC+2 skrev Sverre Moe følgende: > > It seems to be the monitoring that gets the agents disconnected. > > Got this in my log file this last time they got disconnectd. > > Jul 17, 2019 11:58:22 AM > hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler > > uncaughtExc > eption > SEVERE: A thread (Timer-3450/103166) died unexpectedly due to an uncaught > exception, this may leave your Jenkins in a > bad way and is usually indicative of a bug in the code. > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at java.util.Timer.<init>(Timer.java:160) > at java.util.Timer.<init>(Timer.java:132) > at > org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.scheduleRetryQueueProcessing(EventDispatcher.java:296 > > > ) > at > org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.processRetries(EventDispatcher.java:437) > > > at > org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$1.run(EventDispatcher.java:299) > > > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > > Jul 17, 2019 11:58:31 AM > hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler > > uncaughtExc > eption > SEVERE: A thread (Thread-30062/98187) died unexpectedly due to an uncaught > exception, this may leave your Jenkins in > a bad way and is usually indicative of a bug in the code. > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > com.trilead.ssh2.transport.TransportManager.sendAsynchronousMessage(TransportManager.java:649) > > > at > com.trilead.ssh2.channel.ChannelManager.msgChannelRequest(ChannelManager.java:1213) > > > at > com.trilead.ssh2.channel.ChannelManager.handleMessage(ChannelManager.java:1466) > > > at > com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:809) > > > at > com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:502) > > at java.lang.Thread.run(Thread.java:748) > > > Now I have gotten catastrophic failure. I cannot relaunch any agents any > more. > > [07/17/19 12:04:10] [SSH] Opening SSH connection to > jbssles120x64r12.spacetec.no:22. > ERROR: Unexpected error in launching a agent. This is probably a bug in > Jenkins. > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > com.trilead.ssh2.transport.TransportManager.initialize(TransportManager.java:545) > at com.trilead.ssh2.Connection.connect(Connection.java:774) > at > hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:817) > at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:419) > at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:406) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > [07/17/19 12:04:10] Launch failed - cleaning up connection > [07/17/19 12:04:10] [SSH] Connection closed. > > > My Jenkins server has over 500 threads open > Threads: 506 total, 0 running, 506 sleeping, 0 stopped, 0 zombie > > > onsdag 17. juli 2019 10.24.12 UTC+2 skrev Sverre Moe følgende: >> >> We have had to blissfull days of stable Jenkins. Today two nodes are >> disconnected and they will not come back online. >> >> What is strange is it is the same two-three nodes every time. >> Running disconnect on them through the URL >> http://jenkins.example.com/jenkins/computer/NODE_NAME/disconnect, does >> not work. >> I have to enter configuration, Save, then relaunch to get them up running. >> >> I tried setting the ulimit values as suggested in >> >> https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#bulimitsettingsjustforlinuxos >> >> I have also added additional JVM options as suggested in >> >> https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#ajavaparameters >> https://go.cloudbees.com/docs/solutions/jvm-troubleshooting/ >> >> The number of threads of Jenkins server is currently 265. Yesterday when >> all was fine this was up to 300. >> >> >> Maybe ralted or unrelated: >> When this happens we have some builds on other nodes that stops working. >> They are aborted, but are still showing as running. The only thing that >> works is deleting the agent and creating it again, that or restarting >> Jenkins. >> >> >> søndag 14. juli 2019 13.31.51 UTC+2 skrev Sverre Moe følgende: >>> >>> I suspected it might be related, but was not sure. >>> >>> The odd thing this just started being a problem a week ago. Nothing as >>> far as I can see has changed on the Jenkins server. >>> >>> lørdag 13. juli 2019 13.04.44 UTC+2 skrev Ivan Fernandez Calvo følgende: >>>> >>>> I saw that you have another question related with OOM errors in Jenkins >>>> if it is the same instance , this is your real issue with the agents, >>>> until >>>> you do not have a stable Jenkins instance the agent disconnection will be >>>> a >>>> side effect. >>>>> >>>>> -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/78dc2517-d4e0-4d1b-939f-b0546c796807%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.