Hey devs,

I need your opinion on something: As part of our migration from Travis to
Azure, I'm revisiting the build system of Flink. I currently see two
different ways of proceeding, and I would like to know your opinion on the
two options.

A) We build and test Flink in one "mvn clean verify" call on the CI system.
B) We migrate the two staged build of one compile and N test jobs to Azure.

Option A) is what we are currently running as part of testing the
Azure-based system.

Pro/Cons for A)
+ for "apache/flink" pushes and pull requests, the big testing machines
need 1:30 hours to complete (this might go up for a few minutes because the
python tests, and some auxiliary tests are not executed yet)
+ Our build will be easier to maintain and understand, because we rely on
fewer scripts
- builds on Flink forks, using the free Azure plan currently take 3:30
hours to complete.

Pro/Cons for B)
+ builds on Flink forks using the free Azure plan take 1:20 hours,
+ Builds take 1:20 hours on the big testing machines
- maintenance and complexity of the build scripts
- the build times are a lot less predictable, because they depend on the
availability of workers. For the free plan builds, they are currently fast,
because the test stage has 10 jobs, and Azure offers 10 parallel workers.
We currently only have a total of 8 big machines, so there will always be
some queueing. In practice, for the "apache/flink" repo, build times will
be less favorable, because of the scheduling.


In my opinion, the question is mostly: Are you okay to wait 3.5 hours for a
build to finish on your private CI, in favor of a less complex build system?
Ideally, we'll be able to reduce these 3.5 hours by using a more modern
build tool ("gradle") in the future.

I'm happy to hear your thoughts!

Best,
Robert

Reply via email to