Some comments on Chesnay's message:
- Changing the number of splits will not reduce the complexity.
- One can also use the Flink build machines by opening a PR to the
"flink-ci/flink" repo, no need to open crappy PRs :)
- On the number of builds being run: We currently use 4 out of 10 machines
offered by Alibaba, and we are not yet hitting any limits. In addition to
that, another big cloud provider has reached out to us, offering build
capacity.

But generally, I agree that solely relying on the build infrastructure of
Flink is not a good option. The free Azure builds should provide a
reasonable experience.


On Wed, Dec 11, 2019 at 3:22 PM Chesnay Schepler <ches...@apache.org> wrote:

> Note that for B it's not strictly necessary to maintain the current
> number of splits; 2 might already be enough to bring contributor builds
> to a more reasonable level.
>
> I don't think that a contributor build taking 3,5h is a viable option;
> people will start disregarding their own instance and just open a PR
> without having run the tests, which will naturally mean that PR quality
> will drop. Committers probably will start working around this and push
> branches into the flink repo for running tests; we have seen that in the
> past and see this currently for e2e tests.
>
> This will increase the number of builds being run on the Flink machines
> by quite a bit, obviously affecting throughput and latency..
>
> On 11/12/2019 14:59, Arvid Heise wrote:
> > Hi Robert,
> >
> > thank you very much for raising this issue and improving the build
> system.
> >
> > For now, I'd like to stick to a lean solution (= option A).
> >
> > While option B can greatly reduce build times, it also has the habit of
> > clogging up the build machines. Just some arbitrary numbers, but it
> > currently feels like B cuts down latency by half but also uses 10
> machines
> > for 30 minutes, decreasing the overall throughput significantly. Thus,
> when
> > many folks want to see their commits tested, resources quickly run out
> and
> > this in turn significantly increases latency.
> > I'd like to have some more predictable build times and sacrifice some
> > latency for now.
> >
> > It would be interesting to see if we could rearrange the project
> execution
> > in Maven, such that fast projects are executed first. E2E tests should be
> > executed last, which they are somewhat, because of the project
> dependencies.
> >
> > Of course, I'm very interested to improve the overall build experience by
> > exploring other options to Maven.
> >
> > Best,
> >
> > Arvid
> >
> > On Wed, Dec 11, 2019 at 2:32 PM Robert Metzger <rmetz...@apache.org>
> wrote:
> >
> >> Hey devs,
> >>
> >> I need your opinion on something: As part of our migration from Travis
> to
> >> Azure, I'm revisiting the build system of Flink. I currently see two
> >> different ways of proceeding, and I would like to know your opinion on
> the
> >> two options.
> >>
> >> A) We build and test Flink in one "mvn clean verify" call on the CI
> system.
> >> B) We migrate the two staged build of one compile and N test jobs to
> Azure.
> >>
> >> Option A) is what we are currently running as part of testing the
> >> Azure-based system.
> >>
> >> Pro/Cons for A)
> >> + for "apache/flink" pushes and pull requests, the big testing machines
> >> need 1:30 hours to complete (this might go up for a few minutes because
> the
> >> python tests, and some auxiliary tests are not executed yet)
> >> + Our build will be easier to maintain and understand, because we rely
> on
> >> fewer scripts
> >> - builds on Flink forks, using the free Azure plan currently take 3:30
> >> hours to complete.
> >>
> >> Pro/Cons for B)
> >> + builds on Flink forks using the free Azure plan take 1:20 hours,
> >> + Builds take 1:20 hours on the big testing machines
> >> - maintenance and complexity of the build scripts
> >> - the build times are a lot less predictable, because they depend on the
> >> availability of workers. For the free plan builds, they are currently
> fast,
> >> because the test stage has 10 jobs, and Azure offers 10 parallel
> workers.
> >> We currently only have a total of 8 big machines, so there will always
> be
> >> some queueing. In practice, for the "apache/flink" repo, build times
> will
> >> be less favorable, because of the scheduling.
> >>
> >>
> >> In my opinion, the question is mostly: Are you okay to wait 3.5 hours
> for a
> >> build to finish on your private CI, in favor of a less complex build
> >> system?
> >> Ideally, we'll be able to reduce these 3.5 hours by using a more modern
> >> build tool ("gradle") in the future.
> >>
> >> I'm happy to hear your thoughts!
> >>
> >> Best,
> >> Robert
> >>
>
>

Reply via email to