Hey devs, I need your opinion on something: As part of our migration from Travis to Azure, I'm revisiting the build system of Flink. I currently see two different ways of proceeding, and I would like to know your opinion on the two options.
A) We build and test Flink in one "mvn clean verify" call on the CI system. B) We migrate the two staged build of one compile and N test jobs to Azure. Option A) is what we are currently running as part of testing the Azure-based system. Pro/Cons for A) + for "apache/flink" pushes and pull requests, the big testing machines need 1:30 hours to complete (this might go up for a few minutes because the python tests, and some auxiliary tests are not executed yet) + Our build will be easier to maintain and understand, because we rely on fewer scripts - builds on Flink forks, using the free Azure plan currently take 3:30 hours to complete. Pro/Cons for B) + builds on Flink forks using the free Azure plan take 1:20 hours, + Builds take 1:20 hours on the big testing machines - maintenance and complexity of the build scripts - the build times are a lot less predictable, because they depend on the availability of workers. For the free plan builds, they are currently fast, because the test stage has 10 jobs, and Azure offers 10 parallel workers. We currently only have a total of 8 big machines, so there will always be some queueing. In practice, for the "apache/flink" repo, build times will be less favorable, because of the scheduling. In my opinion, the question is mostly: Are you okay to wait 3.5 hours for a build to finish on your private CI, in favor of a less complex build system? Ideally, we'll be able to reduce these 3.5 hours by using a more modern build tool ("gradle") in the future. I'm happy to hear your thoughts! Best, Robert