How many slaves do you have?

It is rather easy to saturate a server with a small number of ssh-slaves
based slaves.

For example, on an AWS m3.large class machine, 10 ssh-slaves concurrently
building jobs as chatty as the mock-load-builder job type is the most you
can push.

If you use JNLP slaves, you can get close to 60 concurrent builds before
the system starts falling over.

The CloudBees NIO ssh-slaves plugin (part if the enterprise offering) has a
different performance characteristic... My most recent tests I was able to
get up to 120 concurrent builds, without affecting the Jenkins UI (I only
had set up for that number of slaves... It likely can go further, though
m3.large is not beefy enough) what was affected though we're build times.
The builds were 2-3 times slower due to back-pressure effects causing the
builds to block on STDOUT.

If anyone else is interested, we will be releasing our scalability test
harness (actually I will be ripping the bottom out of the acceptance test
framework and putting the scalability harness in its place... But the
harness is also useful for scalability testing). We will also be publishing
our findings.

The other thing to watch is how your entropy pool is holding up. The
default random source in Linux typically gets exhausted quite quickly. That
can cause your ssh slaves to fail ping tests and timeout/block

I think the package you want to install is haveged

That or switch java to /dev/urandom

Note: I am currently not recommending any specific slave connector, there
are trade-offs with each type of connector. I will be writing up a blog
post in the near future discussing the various trade-offs.

Standard ssh-slaves degrades poorly... This is great if you want to know
when you have reached your limit

NIO ssh-slaves degrades gracefully, I need to determine where it starts
degrading relative to standard ssh-slaves, but if UI responsiveness is more
important than build times then this has advantages (though you need to be
a paying cloudbees customer)

JNLP scales the highest without affecting build times, but degrades
fastest, is a poor fit for on-demand connection/retention strategies and
does not offer the same transport encryption security as the ssh- versions

Those are just the brief high-level measures

On Monday, 5 May 2014, Charles Chan <charles.wh.c...@gmail.com> wrote:

> Hello,
>
> One of the issue we have recently been experiencing with Jenkins is that the 
> slaves (node) would go offline for no apparent reason and would not reconnect 
> automatically.
> When slaves appear as offline, we tried to launch/reconnect the slave 
> manually but it does not work either. However, we are able to SSH into the 
> machine using PuTTy.
> The only workaround is to restart the Jenkins server, until the problem 
> surfaces again. (Typically in a week.)
>
> Instance Information
> --------------------
> Jenkins Server:            1.562
> SSH Credentials Plugin:    1.6.1
> SSH Slaves Plugin          1.6
>
> Thread dump of slave node:
> {dump}
> "Channel reader thread: qa-linbuild-02" prio=5 WAITING
>       java.lang.Object.wait(Native Method)
>       java.lang.Object.wait(Object.java:485)
>       
> com.trilead.ssh2.channel.ChannelManager.waitUntilChannelOpen(ChannelManager.java:109)
>       
> com.trilead.ssh2.channel.ChannelManager.openSessionChannel(ChannelManager.java:583)
>       com.trilead.ssh2.Session.<init>(Session.java:41)
>       com.trilead.ssh2.Connection.openSession(Connection.java:1129)
>       com.trilead.ssh2.SFTPv3Client.<init>(SFTPv3Client.java:99)
>       com.trilead.ssh2.SFTPv3Client.<init>(SFTPv3Client.java:119)
>       
> hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1160)
>       hudson.slaves.SlaveComputer$2.onClosed(SlaveComputer.java:437)
>       hudson.remoting.Channel.terminate(Channel.java:819)
>       
> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:76)
>
> "Channel reader thread: qa-linbuild-03" prio=5 WAITING
>       java.lang.Object.wait(Native Method)
>       java.lang.Object.wait(Object.java:485)
>       
> com.trilead.ssh2.channel.ChannelManager.waitUntilChannelOpen(ChannelManager.java:109)
>       
> com.trilead.ssh2.channel.ChannelManager.openSessionChannel(ChannelManager.java:583)
>       com.trilead.ssh2.Session.<init>(Session.java:41)
>       com.trilead.ssh2.Connection.openSession(Connection.java:1129)
>       com.trilead.ssh2.SFTPv3Client.<init>(SFTPv3Client.java:99)
>       com.trilead.ssh2.SFTPv3Client.<init>(SFTPv3Client.java:119)
>       
> hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1160)
>       hudson.slaves.SlaveComputer$2.onClosed(SlaveComputer.java:437)
>       hudson.remoting.Channel.terminate(Channel.java:819)
>       
> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:76)
> {dump}
>
> Also concerning is the number of threads is in the BLOCKED (126!).
> Doesn't seem normal as there are no BLOCKED threads after the server is 
> restarted.
> {dump}
> // 118 instances
> "Computer.threadPoolForRemoting [#26]" daemon prio=5 BLOCKED
>       
> hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1152)
>       hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:542)
>       
> jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
>       java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>       java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>       java.util.concurrent.FutureTask.run(FutureTask.java:138)
>       
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       java.lang.Thread.run(Thread.java:662)
>
> // 8 instances
> "Computer.threadPoolForRemoting [#2922]" daemon prio=5 BLOCKED
>       hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:639)
>       hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:222)
>       
> jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
>       java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>       java.util.concurrent.FutureTask.run(FutureTask.java:138)
>       
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       java.lang.Thread.run(Thread.java:662)
> {dump}
>
> Looking forward to any ideas or suggestions.
>
> Thank you.
> Charles Chan
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Jenkins Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to 
> jenkinsci-users+unsubscr...@googlegroups.com<javascript:_e(%7B%7D,'cvml','jenkinsci-users%2bunsubscr...@googlegroups.com');>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
Sent from my phone

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to