Short version

I'm experiencing somewhat slow slave start up when nodes are offline. 
Launch method is "Launch slave via execution of command on the Master" 
(bash script), availability is "Take this salve on-line when in demand and 
off-line when idle" and "In demand delay" is 0. When I launch a new job 
while slaves are offline they take about a minute to start launching. If I 
start slaves manually they come online immediately, so it is not our bash 
script that is slow. How do I make on-demand slaves start launching as soon 
as they are needed?

Longer version

I want (some of) my jenkins jobs to be executed on a build farm (Oracle 
Grid Engine). The build farm does load balancing between servers in the 
build farm and handles things as requests for a specific OS or 
architecture. I would like it work like this:
- A job is triggered somehow. For this example, assume that the job is 
restricted to run on the build farm and on RHEL6.4.
- A new jenkins node is launched immediately and connects to the build 
farm. The build farm schedules a job to be run a RHEL6.4 server (it may 
have to wait if no servers are available, or if the jenkins user has 
already scheduled too many jobs). 
- Preferrably the jenkins node receives information about the jenkins job 
name and build number that caused it to be launched (so that the 
information can be logged for any trouble shooting later).
- When the jenkins job has finished, the jenkins node is disconnected.
- No "# of executors" is needed, since the build farm has its own limit. (?)

The build farm is configured to automatically disconnect after 72 hours, 
regardless of if there are any ongoing jobs, which means I cannot use "Keep 
this slave on-line as much as possible" (I risk e.g. 
DiagnosedStreamCorruptionException). This is not a build farm setting that 
I can change.

Currently, we have nodes with 
- Launch method "Launch slave via execution of command on the Master"
- Availability "Take this salve on-line when in demand and off-line when 
idle"
- In demand delay 0
- Idle delay 1 (since it cannot be set to 0).
  
The launch command runs a bash script on the master. The script sets up a 
connection to the build farm and listens to it using netcat. When the build 
farm assigns a server to the connection, slave.jar is launched and the 
jenkins job is executed. When the slave has been idle for at least 1 minute 
(or after at most 72 hours) it is disconnected.

If we click the "Launch slave agent" button on the node's page, it is 
immediately launched and ready for jenkins jobs, but if the node is 
disconnected it takes about a minute before launch is started (or sometimes 
even up to two minutes).

Problems with this approach:
- hudson.slaves.ComputerRetentionWork only checks if a slave is needed once 
per minute, so there is normally a pretty long delay before a slave is 
launched, that is, before the build farm can even start to find a host to 
execute on. Users of our jenkins setup get very annoyed by this behavior.
- More than one job can use the node while it is connected, and we do not 
know which jobs that do so, which makes it difficult to trouble shoot any 
problems.

Any ideas on how to solve this? Can we do some groovy hack to listen for 
the need for a specific jenkins node and launch it? Or do we need to write 
our own plugin? Would that even be possible?

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to