> On Aug 23, 2019, at 2:13 PM, Christofer Dutz <christofer.d...@c-ware.de>
> wrote:
>
> well I agree that we could possibly split up the job into multiple separate
> builds.
I’d highly highly highly recommend it. Right now, the job effectively
has a race condition: a job-level timer based upon the assumption that ALL
nodes in the workflow will be available within that timeframe. That’s not
feasible long term.
> However this makes running the Jenkins Multibranch pipeline plugin quite a
> bit more difficult.
Looking at the plc4x Jenkinsfile, prior to INFRA creating the
’nexus-deploy’ label and pulling H50 from the Ubuntu label, it wouldn’t have
been THAT difficult.
e.g., this stage:
```
stage('Deploy') {
when {
branch 'develop'
}
// Only the official build nodes have the credentials to deploy
setup.
agent {
node {
label 'ubuntu'
}
}
steps {
echo 'Deploying'
// Clean up the snapshots directory.
dir("local-snapshots-dir/") {
deleteDir()
}
// Unstash the previously stashed build results.
unstash name: 'plc4x-build-snapshots'
// Deploy the artifacts using the wagon-maven-plugin.
sh 'mvn -f jenkins.pom -X -P deploy-snapshots wagon:upload'
// Clean up the snapshots directory (freeing up more space
after deploying).
dir("local-snapshots-dir/") {
deleteDir()
}
}
}
```
This seems pretty trivially replaced with build
(https://jenkins.io/doc/pipeline/steps/pipeline-build-step/#build-build-a-job)
and copyartifacts. Just pass the build # as a param between jobs.
Since the site section also has the same sort of code and problems, a
Jenkins pipeline library may offer code consolidation facilities to make it
even easier.
> And the thing is, that our setup has been working fine for about 2 years and
> we are just recently having these problems.
Welp, things change. Lots of project builds break on a regular basis
because of policy decisions, the increase in load, infra software changes, etc.
Consider it very lucky it’s been 2 years. The big projects get broken on a
pretty regular basis. (e.g., things like https://s.apache.org/os78x just fall
from the sky with no warning. This removal broke GitHub multi branch pipelines
as well and many projects I know of haven’t switched. It’s just easier to run
Scan every-so-often thus making the load that much worse ...)
I should probably mention that many many projects already have their
website and deploy steps separated from their testing job. It’s significantly
more efficient on a global/community basis. In my experiences with Jenkins and
other FIFO job deployment systems (as well as going back to your original
question):
fairness is better achieved when the jobs are faster/smaller because
it gives the scheduler more opportunities to spread the load.
> So I didn't want to just configure the actual problem away, because I think
> with splitting up the into multiple separate
> jobs will just Bring other problems and in the end our deploy jobs will then
> just still hang for many, many hours.
Instead, this is going to last for another x years and then H50 is
going to get busy again as everyone moves their deploy step to that node.
Worse, it’s going to clog up the Ubuntu label even more because those jobs are
going to tie up the OTHER node that their job is associated with while the H50
job runs. plc4x at least as the advantage that it’s only breaking itself when
it’s occupying the H50 node.
As mentioned earlier, the ‘websites’ stage has the same issue and will
likely be the first to break since there are other projects that are already
using that label.