> On Jul 23, 2018, at 3:04 PM, Joan Touzet <woh...@apache.org> wrote: > > > This is why we switched to Docker for ASF Jenkins CI. By pre-building our > Docker container images for CI, we take control over the build environment > in a very proactive way, reducing Infra's investment to just keeping the > build nodes up, running, and with sufficient disk space.
All of the projects I’ve been involved with have been using Docker-based builds for a few years now. Experience there has shown that to ease with debugging (esp since the Jenkins machines are so finicky) that information from inside the container needs to be available after the container exits. As a result, Apache Yetus (which is used to control the majority of builds for projects like Hadoop and HBase) will specifically mount key directories from the workspace inside the container so that they are readable after the build finishes. Otherwise one spends a significant amount of time doing a lot of head scratching as to why stuff failed on the Jenkins build servers but not locally. It’s also worth pointing out that “just use Docker” only works if one is building on Linux. That isn’t an option on Windows. This is why a ‘one size fits all’ policy for all jobs isn’t really going to work. Performance on the Windows machines is pretty awful (I’m fairly certain it’s IO), so any time savings there is huge. (For comparison, the last time I looked a Hadoop Linux full build + full analysis: 12 hours, Windows full build + partial analysis: 19 hours… 7 hours difference with stuff turned off!) > It also means that, once a build is done, there is no mess on the Jenkins > build node to clean up - just a regular `docker rm` or `docker rmi` is > sufficient to restore disk space. Infra is already running these aggressively, > since if a build hangs due to an unresponsive docker daemon or network > failure, our post-run script to clean up after ourselves may never run. Apache Yetus pretty much manages the docker repos for the ‘Hadoop’ queue machines since it runs so frequently. It happily deletes stale images after a time as well as killing any stuck containers that are still running after a shorter period of time. This way ‘docker build’ commands can benefit from cache re-use but still get forced to do full rebuilds after a time. I enabled the docker-cleanup functionality as part of the precommit-admin job in January as well, so it’s been working alongside whatever extra docker bits the INFRA team has been using on the non-Hadoop nodes. > We don't put everything into saved artefacts either, but we have built a > simple Apache CouchDB-based database to which we upload any artefacts we > want to save for development purposes only. … and where does this DB run? Also, it’s not so much about the finished artifacts as much as it is about the state of the workspace post-build. If no jars get built, then we want to know what happened. > We had this issue too - which is why we build under a `/tmp` directory > inside the Docker container to avoid one build trashing another build's > workspace directory via the multi-node sync mechanism. Apache Yetus based builds mount a dir inside the container. It’s relatively expensive to rebuild the repo for large projects. For Hadoop, this takes in the 5-10 minute area. That may not seem like a lot. But given the number of build jobs per day, that adds up very quickly. The quicker the big jobs run, the more cycles available for everyone and the faster contributors get feedback on their patches. [Ofc,