On Friday, October 28, 2016 at 9:36:49 PM UTC-7, John Calsbeek wrote:
>
>
> Shared storage is a potential option, yes, but the tasks in question are 
> currently not very fault-tolerant when it comes to network hitches.
>

Well, it would pay to make them more fault-tolerant :-) But even if you do 
not fix the process, you do not have to run it from the shared storage, 
just use it as storage. Using a node-local mirror, you can rsync it from 
shared storage, run the task, then rsync it back - assuming your data does 
not change much (which I understand is not always the case) - you will soon 
have a relatively recent rsync copy on every node - reducing amount of data 
moving. May or may not work in your case, but something to consider.
 

>  
>  
>
>> But more to the point, if your main issue is that you are worried that a 
>> node may be unavailable, you may consider some automatic node allocation. I 
>> am not sure if there are other examples, but for example the AWS node 
>> allocation can automatically allocate a new node if no threads are 
>> available for a label. That may be a decent backup strategy. If you are not 
>> using AWS - you can probably look if there is another node provisioning 
>> plugin that fits or if not, look at how they do that and write your own 
>> plugin to do it
>>
>
> Assuming that we have a fixed amount of computing resources, does this 
> have any advantage over writing a LoadBalancer plugin?
>

If you are allocating your nodes instead of pre-creating, you do not have 
to have a big shared pool, instead specific nodes are allocated with same 
label only as needed, and as old nodes that died are decommissioned, they 
can re-join the pool of available resources. Of course if you feed the 
affinity requirement, just using them all as a pool is probably easier.

 
>
>> But maybe I am overthinking it. In the end, if your primary concern is 
>> that node may be down - remember that pipeline is groovy code - groovy code 
>> that has access to the Jenkins API/internals. You can write some code that 
>> will check the state of the slaves and select a label to use before you 
>> even get to the node() statement. Sure, that will not fix the issue of a 
>> node going down in a middle of a job, but may catch the job before it 
>> assigns a task to a dead node.
>>
>
> Ah, that's an interesting idea. Something that I forgot to mention in the 
> original post is that if there was a node() function that allocates with a 
> timeout, that would also be a building block that we could use to fix this 
> problem. (If attempting to allocate a specific node fails with a timeout, 
> then schedule on a fallback. timeout() doesn't work because that would 
> apply the timeout to the task as well, not merely to the attempt to 
> allocate the node.) We could indeed query the status of nodes directly. I 
> have a niggling doubt that it would be possible to do this without a race 
> condition (what if the node goes down between querying its status and 
> scheduling on it?), but it's definitely something worth investigating.
>

I am wondering if you can do some weird combination of  parallel + sleep + 
failFast  + try/catch to emulate a timeout for a specific task

>  
>
>> Alternatively, you can simply write another job, in lieu of a plugin, 
>> that will scan all your tasks and nodes and if it detects a node down and a 
>> task waiting for it, assign the label to another node from the "standby" 
>> pool
>>
>
> This is an idea that we had considered, yeah, although I was considering 
> it as a first step in the pipeline before scheduling, which made me nervous 
> about race conditions. But if, as you suggest, it was a frequently run job 
> which is always attempting to set up node allocations… that could 
> definitely work. Good suggestion, thanks!
>  
>
Throw enough things against a wall, something will stick ;-)  Glad to be of 
help.

Good luck.

 -M

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/997916ea-597c-4853-9cf0-8946c81e5c1c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to