> > >> >> From your reply, I am even more concerned with disproportionally high >> number of the blocked threads (120) compare to offline slaves (2 at the >> time), as it sounds like it should be closer to 1:1? >> > > Yes, it sounds like there is a race condition between the post disconnect > tasks and the reconnect tasks: > https://github.com/jenkinsci/ssh-slaves-plugin/blob/ssh-slaves-1.6/src/main/java/hudson/plugins/sshslaves/SSHLauncher.java#L1152is > blocking until the slave is connected... but the slave cannot connect > until the disconnect tasks are complete... > > >> >> Do you have 'dead' slaves, and what's your logging configuration like?
I'm tracking down a similar problem, in that our Jenkins instance (which isn't that large) slows to the state of the UI timing out. Taking occasional stack-dumps (this is an early guess, could be very wrong) shows, basically, the UI waiting to get access to java.util.logging.ConsoleHandler. e.g: - waiting to lock <0x00000000804285c0> (a java.util.logging.ConsoleHandler) at java.util.logging.ConsoleHandler.publish(ConsoleHandler.java:105) at java.util.logging.Logger.log(Logger.java:565) at java.util.logging.Logger.doLog(Logger.java:586) at java.util.logging.Logger.logp(Logger.java:702) at org.apache.commons.logging.impl.Jdk14Logger.log(Jdk14Logger.java:87) at org.apache.commons.logging.impl.Jdk14Logger.trace(Jdk14Logger.java:239) at org.apache.commons.beanutils.BeanUtilsBean.copyProperty(BeanUtilsBean.java:372) ... etc etc down to the caller Now - the interesting thing is that that trace seems to be going through apache logging, then JUL logging. But I get nothing on the console, so it's either throwing an exception because of a misconfiguration, or it's checking whether we actually wanted this output after acquiring the lock. Either way, unsurprisingly I don't care about trace logs from apache beanutils! ;-) I suspect someone may have adjusted our logging trying to track something down. Second interesting thing is I notice a lot of the time, the console is being held by Computer.threadPoolForRemoting. E.g: .... etc etc at java.util.logging.StreamHandler.publish(StreamHandler.java:196) - locked <0x00000000804285c0> (a java.util.logging.ConsoleHandler) at java.util.logging.ConsoleHandler.publish(ConsoleHandler.java:105) at java.util.logging.Logger.log(Logger.java:565) at java.util.logging.Logger.doLog(Logger.java:586) at java.util.logging.Logger.log(Logger.java:675) at hudson.remoting.ProxyOutputStream$Chunk$1.run(ProxyOutputStream.java:285) at hudson.remoting.PipeWriter$1.run(PipeWriter.java:158) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:111) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) Again, it's one of those pesky warnings that never actually ends up on the console, but what it's doing is LOGGER.log(Level.WARNING, "Failed to ack the stream",e); It seems like it's running that a lot (which I suspected might be for non-working slaves). I think it attempts to generate a stack trace, which is expensive (and helpfully JUL does all that whilst holding onto the console lock... >:-S ) - which may be why the responsiveness gets crushed. Anyway, HTMH and back to digging... -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.