Re: [DISCUSS] Need feedback on Azure-based build system

Robert Metzger Fri, 13 Dec 2019 07:03:20 -0800

Thanks for your feedback.
I will then go for option B.

On Fri, Dec 13, 2019 at 2:51 PM Till Rohrmann <trohrm...@apache.org> wrote:


> Thanks for starting this discussion Robert.
>
> I can see benefits for both options as already mentioned in this thread.
> However, given that we already have the profile splits and that it would
> considerably decrease feedback for developers on their personal Azure
> accounts, I'd be in favour of option B for the time being. If we see that
> we can keep the build time for the local Azure setups down differently,
> then one could start simplifying the build.
>
> Cheers,
> Till
>
> On Fri, Dec 13, 2019 at 2:42 PM Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
>> It’s a though question. One the one hand I like less complexity in the
>> build system. But one of the most important things for developers is fast
>> iteration cycles.
>>
>> So I would prefer the solution that keeps the iteration time low.
>>
>> Best,
>> Aljoscha
>>
>> > On 13. Dec 2019, at 14:41, Chesnay Schepler <ches...@apache.org> wrote:
>> >
>> > It depends on how to define "split"; if you split by module (as we do
>> currently) you have the same complexity as we have right now;
>> > caching of artifacts and brittle definition of splits.
>> >
>> > But there are other ways to split builds, for example into unit and
>> integration tests; could also add end-to-end tests to this list.
>> > At that point we're basically talking about multiple parallel builds
>> that are fully independent.
>> > Let's also remember that caching of the build artifact is only useful
>> when the compile times are large enough to warrant it;
>> > if we only go with 2 splits in the grand scheme of things the caching
>> wouldn't even be required.
>> > We added the caching to Travis since at 5+ builds (and the guarantee
>> for this number to go up) the compilation time was a much larger factor.
>> >
>> > As for the current split setup we have (as in by modules), it isn't
>> just about faster feedback times; they can also be used to isolate
>> components from each other.
>> > I  know that quite a few people appreciate the kafka/python module
>> being in it's own split for example.
>> >
>> > On 11/12/2019 16:44, Robert Metzger wrote:
>> >> Some comments on Chesnay's message:
>> >> - Changing the number of splits will not reduce the complexity.
>> >> - One can also use the Flink build machines by opening a PR to the
>> >> "flink-ci/flink" repo, no need to open crappy PRs :)
>> >> - On the number of builds being run: We currently use 4 out of 10
>> machines
>> >> offered by Alibaba, and we are not yet hitting any limits. In addition
>> to
>> >> that, another big cloud provider has reached out to us, offering build
>> >> capacity.
>> >>
>> >> But generally, I agree that solely relying on the build infrastructure
>> of
>> >> Flink is not a good option. The free Azure builds should provide a
>> >> reasonable experience.
>> >>
>> >>
>> >> On Wed, Dec 11, 2019 at 3:22 PM Chesnay Schepler <ches...@apache.org>
>> wrote:
>> >>
>> >>> Note that for B it's not strictly necessary to maintain the current
>> >>> number of splits; 2 might already be enough to bring contributor
>> builds
>> >>> to a more reasonable level.
>> >>>
>> >>> I don't think that a contributor build taking 3,5h is a viable option;
>> >>> people will start disregarding their own instance and just open a PR
>> >>> without having run the tests, which will naturally mean that PR
>> quality
>> >>> will drop. Committers probably will start working around this and push
>> >>> branches into the flink repo for running tests; we have seen that in
>> the
>> >>> past and see this currently for e2e tests.
>> >>>
>> >>> This will increase the number of builds being run on the Flink
>> machines
>> >>> by quite a bit, obviously affecting throughput and latency..
>> >>>
>> >>> On 11/12/2019 14:59, Arvid Heise wrote:
>> >>>> Hi Robert,
>> >>>>
>> >>>> thank you very much for raising this issue and improving the build
>> >>> system.
>> >>>> For now, I'd like to stick to a lean solution (= option A).
>> >>>>
>> >>>> While option B can greatly reduce build times, it also has the habit
>> of
>> >>>> clogging up the build machines. Just some arbitrary numbers, but it
>> >>>> currently feels like B cuts down latency by half but also uses 10
>> >>> machines
>> >>>> for 30 minutes, decreasing the overall throughput significantly.
>> Thus,
>> >>> when
>> >>>> many folks want to see their commits tested, resources quickly run
>> out
>> >>> and
>> >>>> this in turn significantly increases latency.
>> >>>> I'd like to have some more predictable build times and sacrifice some
>> >>>> latency for now.
>> >>>>
>> >>>> It would be interesting to see if we could rearrange the project
>> >>> execution
>> >>>> in Maven, such that fast projects are executed first. E2E tests
>> should be
>> >>>> executed last, which they are somewhat, because of the project
>> >>> dependencies.
>> >>>> Of course, I'm very interested to improve the overall build
>> experience by
>> >>>> exploring other options to Maven.
>> >>>>
>> >>>> Best,
>> >>>>
>> >>>> Arvid
>> >>>>
>> >>>> On Wed, Dec 11, 2019 at 2:32 PM Robert Metzger <rmetz...@apache.org>
>> >>> wrote:
>> >>>>> Hey devs,
>> >>>>>
>> >>>>> I need your opinion on something: As part of our migration from
>> Travis
>> >>> to
>> >>>>> Azure, I'm revisiting the build system of Flink. I currently see two
>> >>>>> different ways of proceeding, and I would like to know your opinion
>> on
>> >>> the
>> >>>>> two options.
>> >>>>>
>> >>>>> A) We build and test Flink in one "mvn clean verify" call on the CI
>> >>> system.
>> >>>>> B) We migrate the two staged build of one compile and N test jobs to
>> >>> Azure.
>> >>>>> Option A) is what we are currently running as part of testing the
>> >>>>> Azure-based system.
>> >>>>>
>> >>>>> Pro/Cons for A)
>> >>>>> + for "apache/flink" pushes and pull requests, the big testing
>> machines
>> >>>>> need 1:30 hours to complete (this might go up for a few minutes
>> because
>> >>> the
>> >>>>> python tests, and some auxiliary tests are not executed yet)
>> >>>>> + Our build will be easier to maintain and understand, because we
>> rely
>> >>> on
>> >>>>> fewer scripts
>> >>>>> - builds on Flink forks, using the free Azure plan currently take
>> 3:30
>> >>>>> hours to complete.
>> >>>>>
>> >>>>> Pro/Cons for B)
>> >>>>> + builds on Flink forks using the free Azure plan take 1:20 hours,
>> >>>>> + Builds take 1:20 hours on the big testing machines
>> >>>>> - maintenance and complexity of the build scripts
>> >>>>> - the build times are a lot less predictable, because they depend
>> on the
>> >>>>> availability of workers. For the free plan builds, they are
>> currently
>> >>> fast,
>> >>>>> because the test stage has 10 jobs, and Azure offers 10 parallel
>> >>> workers.
>> >>>>> We currently only have a total of 8 big machines, so there will
>> always
>> >>> be
>> >>>>> some queueing. In practice, for the "apache/flink" repo, build times
>> >>> will
>> >>>>> be less favorable, because of the scheduling.
>> >>>>>
>> >>>>>
>> >>>>> In my opinion, the question is mostly: Are you okay to wait 3.5
>> hours
>> >>> for a
>> >>>>> build to finish on your private CI, in favor of a less complex build
>> >>>>> system?
>> >>>>> Ideally, we'll be able to reduce these 3.5 hours by using a more
>> modern
>> >>>>> build tool ("gradle") in the future.
>> >>>>>
>> >>>>> I'm happy to hear your thoughts!
>> >>>>>
>> >>>>> Best,
>> >>>>> Robert
>> >>>>>
>> >>>
>> >
>>
>>

Re: [DISCUSS] Need feedback on Azure-based build system

Reply via email to