>
>
>>
>> From your reply, I am even more concerned with disproportionally high
>> number of the blocked threads (120) compare to offline slaves (2 at the
>> time), as it sounds like it should be closer to 1:1?
>>
>
> Yes, it sounds like there is a race condition between the post disconnect
> tasks and the reconnect tasks:
> https://github.com/jenkinsci/ssh-slaves-plugin/blob/ssh-slaves-1.6/src/main/java/hudson/plugins/sshslaves/SSHLauncher.java#L1152is
>  blocking until the slave is connected... but the slave cannot connect
> until the disconnect tasks are complete...
>
>
>>
>>
​Do you have 'dead' slaves, and what's your logging configuration like?

I'm tracking down a similar problem, in that our Jenkins instance (which
isn't that large) slows to the state of the UI timing out.

Taking occasional stack-dumps (this is an early guess, could be very wrong)
shows, basically, the UI waiting to get access to
java.util.logging.ConsoleHandler​.

e.g:

- waiting to lock <0x00000000804285c0> (a java.util.logging.ConsoleHandler)
        at java.util.logging.ConsoleHandler.publish(ConsoleHandler.java:105)
        at java.util.logging.Logger.log(Logger.java:565)
        at java.util.logging.Logger.doLog(Logger.java:586)
        at java.util.logging.Logger.logp(Logger.java:702)
        at org.apache.commons.logging.impl.Jdk14Logger.log(Jdk14Logger.java:87)
        at 
org.apache.commons.logging.impl.Jdk14Logger.trace(Jdk14Logger.java:239)
        at 
org.apache.commons.beanutils.BeanUtilsBean.copyProperty(BeanUtilsBean.java:372)
... etc etc down to the caller



​Now - the interesting thing is that that trace seems to be going through
apache logging, then JUL logging. But I get nothing on the console, so it's
either throwing an exception because of a misconfiguration, or it's
checking whether we actually wanted this output after acquiring the lock.

Either way, unsurprisingly I don't care about trace logs from apache
beanutils! ;-) I suspect someone may have adjusted our logging trying to
track something down.

Second interesting thing is I notice a lot of the time, the console is
being held by Computer.threadPoolForRemoting. E.g:


.... etc etc

at java.util.logging.StreamHandler.publish(StreamHandler.java:196)
        - locked <0x00000000804285c0> (a java.util.logging.ConsoleHandler)
        at java.util.logging.ConsoleHandler.publish(ConsoleHandler.java:105)
        at java.util.logging.Logger.log(Logger.java:565)
        at java.util.logging.Logger.doLog(Logger.java:586)
        at java.util.logging.Logger.log(Logger.java:675)
        at 
hudson.remoting.ProxyOutputStream$Chunk$1.run(ProxyOutputStream.java:285)
        at hudson.remoting.PipeWriter$1.run(PipeWriter.java:158)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at 
hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:111)
        at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
        at 
jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)


​Again, it's one of those pesky warnings that never actually ends up on the
console, but what it's doing is

LOGGER.log(Level.WARNING, "Failed to ack the stream",e);​


​It seems like it's running that a lot (which I suspected might be for
non-working slaves). I think it attempts to generate a stack trace, which
is expensive (and helpfully JUL does all that whilst holding onto the
console lock... >:-S ) - which may be ​why the responsiveness gets crushed.

​Anyway, HTMH and back to digging...
​

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to