On Friday, October 28, 2016 at 9:36:49 PM UTC-7, John Calsbeek wrote: > > > Shared storage is a potential option, yes, but the tasks in question are > currently not very fault-tolerant when it comes to network hitches. >
Well, it would pay to make them more fault-tolerant :-) But even if you do not fix the process, you do not have to run it from the shared storage, just use it as storage. Using a node-local mirror, you can rsync it from shared storage, run the task, then rsync it back - assuming your data does not change much (which I understand is not always the case) - you will soon have a relatively recent rsync copy on every node - reducing amount of data moving. May or may not work in your case, but something to consider. > > > >> But more to the point, if your main issue is that you are worried that a >> node may be unavailable, you may consider some automatic node allocation. I >> am not sure if there are other examples, but for example the AWS node >> allocation can automatically allocate a new node if no threads are >> available for a label. That may be a decent backup strategy. If you are not >> using AWS - you can probably look if there is another node provisioning >> plugin that fits or if not, look at how they do that and write your own >> plugin to do it >> > > Assuming that we have a fixed amount of computing resources, does this > have any advantage over writing a LoadBalancer plugin? > If you are allocating your nodes instead of pre-creating, you do not have to have a big shared pool, instead specific nodes are allocated with same label only as needed, and as old nodes that died are decommissioned, they can re-join the pool of available resources. Of course if you feed the affinity requirement, just using them all as a pool is probably easier. > >> But maybe I am overthinking it. In the end, if your primary concern is >> that node may be down - remember that pipeline is groovy code - groovy code >> that has access to the Jenkins API/internals. You can write some code that >> will check the state of the slaves and select a label to use before you >> even get to the node() statement. Sure, that will not fix the issue of a >> node going down in a middle of a job, but may catch the job before it >> assigns a task to a dead node. >> > > Ah, that's an interesting idea. Something that I forgot to mention in the > original post is that if there was a node() function that allocates with a > timeout, that would also be a building block that we could use to fix this > problem. (If attempting to allocate a specific node fails with a timeout, > then schedule on a fallback. timeout() doesn't work because that would > apply the timeout to the task as well, not merely to the attempt to > allocate the node.) We could indeed query the status of nodes directly. I > have a niggling doubt that it would be possible to do this without a race > condition (what if the node goes down between querying its status and > scheduling on it?), but it's definitely something worth investigating. > I am wondering if you can do some weird combination of parallel + sleep + failFast + try/catch to emulate a timeout for a specific task > > >> Alternatively, you can simply write another job, in lieu of a plugin, >> that will scan all your tasks and nodes and if it detects a node down and a >> task waiting for it, assign the label to another node from the "standby" >> pool >> > > This is an idea that we had considered, yeah, although I was considering > it as a first step in the pipeline before scheduling, which made me nervous > about race conditions. But if, as you suggest, it was a frequently run job > which is always attempting to set up node allocations… that could > definitely work. Good suggestion, thanks! > > Throw enough things against a wall, something will stick ;-) Glad to be of help. Good luck. -M -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/997916ea-597c-4853-9cf0-8946c81e5c1c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
