We are currently using Michael's suggestion of running a job periodically which looks for labels with 0 nodes and assigns that label to a node in the "fallback" pool. We are continuing to manage locality by manually juggling label assignments. So now we have guaranteed incremental builds, which is nice, but every time we add or remove a pipeline, we have to redistribute labels somehow.
On Monday, May 29, 2017 at 10:11:18 AM UTC-7, Paul van der Ende wrote: > > Same issue here. Our incremental builds suffer from very poor locality > because the pipeline *always* start on a completely different set of node > then last time, and we have quite a few nodes. I came up with the same > tricks you mention, but it did not help much. Did you find a better > solution already, besides managing nodes by hand? > > My preferred solution is as follows: Ideally every 'node()' in my pipeline > should stick to the same Jenkins node whenever possible. They also should > end up in the same workspace to improve reuse of cached artifacts, but > ideally different 'nodes()' of the same pipeline should end up in a > different working space to avoid conflicts. > > If I compare nodes() with freestyle jobs then they should basically work > the same way. I can not believe this is not the case. > > > On Saturday, October 29, 2016 at 5:06:15 AM UTC+2, John Calsbeek wrote: >> >> We have a problem trying to get more control over how the node() decides >> what node to allocate an executor on. Specifically, we have a situation >> where we have a pool of nodes with a specific label, all of which are >> capable of executing a given task, but with a strong preference to run the >> task on the same node that ran this task before. (Note that these tasks are >> simply different pieces of code within a single pipeline, running in >> parallel.) This is what Jenkins does normally, at job granularity, but as >> JENKINS-36547 <https://issues.jenkins-ci.org/browse/JENKINS-36547> says, >> all tasks scheduled from any given pipeline will be given the same hash, >> which means that the load balancer has no idea which tasks should be >> assigned to which node. In our situation, only a single pipeline ever >> assigns jobs to this pool of nodes. >> >> So far we have worked around the issue by assigning a different label to >> each and every node in the pool in question, but this has a new issue: if >> any node in that pool goes down for any reason, the task will not be >> reassigned to any other node, and the whole pipeline will hang or time out. >> >> We have worked around *that* by assigning each task to "my-node-pool-# >> || my-node-pool-fallback", where my-node-pool-fallback is a label which >> contains a few standby nodes, so that if one of the primary nodes goes down >> the pipeline as a whole can still complete. It will be slower (these tasks >> can take two to ten times longer when not running on the same node they ran >> last time), but it will at least complete. >> >> Unfortunately, the label expression doesn't actually mean "first try to >> schedule on the first node in the OR, then use the second one if the first >> one is not available." Instead, there will usually be some tasks that >> schedule on a fallback node even if the node they are "assigned" to is >> still available. As a result, almost every run of this pipeline ends up >> taking the worst-case time: it is likely that *some* task will wander >> away from its assigned node to run on a fallback, which leads the fallback >> nodes to be over-scheduled and leaves other nodes sitting idle. >> >> The question is: what are our options? One hack we've considered is >> attempting to game the scheduler by using sleep()s: initially schedule all >> the fallback nodes with a task that does nothing but sleep(), then schedule >> all our real tasks (which will now go to their assigned machines whenever >> possible, because the fallback nodes are busy sleeping), and finally let >> the sleeps complete so that any tasks which couldn't execute on their >> assigned machines now execute on the fallbacks. A better solution would >> probably be to create a LoadBalancer plugin that codifies this somehow: >> preferentially scheduling tasks only on their assigned label, scheduling on >> fallbacks only after 30 seconds or a minute. >> >> Is anyone out there dealing with similar issues, or know of a solution >> that I have overlooked? >> >> Thanks, >> John Calsbeek >> > -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/98246e7a-34c2-41bd-a442-c6253eda9f78%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.