It’s a though question. One the one hand I like less complexity in the build 
system. But one of the most important things for developers is fast iteration 
cycles.

So I would prefer the solution that keeps the iteration time low.

Best,
Aljoscha

> On 13. Dec 2019, at 14:41, Chesnay Schepler <ches...@apache.org> wrote:
> 
> It depends on how to define "split"; if you split by module (as we do 
> currently) you have the same complexity as we have right now;
> caching of artifacts and brittle definition of splits.
> 
> But there are other ways to split builds, for example into unit and 
> integration tests; could also add end-to-end tests to this list.
> At that point we're basically talking about multiple parallel builds that are 
> fully independent.
> Let's also remember that caching of the build artifact is only useful when 
> the compile times are large enough to warrant it;
> if we only go with 2 splits in the grand scheme of things the caching 
> wouldn't even be required.
> We added the caching to Travis since at 5+ builds (and the guarantee for this 
> number to go up) the compilation time was a much larger factor.
> 
> As for the current split setup we have (as in by modules), it isn't just 
> about faster feedback times; they can also be used to isolate components from 
> each other.
> I  know that quite a few people appreciate the kafka/python module being in 
> it's own split for example.
> 
> On 11/12/2019 16:44, Robert Metzger wrote:
>> Some comments on Chesnay's message:
>> - Changing the number of splits will not reduce the complexity.
>> - One can also use the Flink build machines by opening a PR to the
>> "flink-ci/flink" repo, no need to open crappy PRs :)
>> - On the number of builds being run: We currently use 4 out of 10 machines
>> offered by Alibaba, and we are not yet hitting any limits. In addition to
>> that, another big cloud provider has reached out to us, offering build
>> capacity.
>> 
>> But generally, I agree that solely relying on the build infrastructure of
>> Flink is not a good option. The free Azure builds should provide a
>> reasonable experience.
>> 
>> 
>> On Wed, Dec 11, 2019 at 3:22 PM Chesnay Schepler <ches...@apache.org> wrote:
>> 
>>> Note that for B it's not strictly necessary to maintain the current
>>> number of splits; 2 might already be enough to bring contributor builds
>>> to a more reasonable level.
>>> 
>>> I don't think that a contributor build taking 3,5h is a viable option;
>>> people will start disregarding their own instance and just open a PR
>>> without having run the tests, which will naturally mean that PR quality
>>> will drop. Committers probably will start working around this and push
>>> branches into the flink repo for running tests; we have seen that in the
>>> past and see this currently for e2e tests.
>>> 
>>> This will increase the number of builds being run on the Flink machines
>>> by quite a bit, obviously affecting throughput and latency..
>>> 
>>> On 11/12/2019 14:59, Arvid Heise wrote:
>>>> Hi Robert,
>>>> 
>>>> thank you very much for raising this issue and improving the build
>>> system.
>>>> For now, I'd like to stick to a lean solution (= option A).
>>>> 
>>>> While option B can greatly reduce build times, it also has the habit of
>>>> clogging up the build machines. Just some arbitrary numbers, but it
>>>> currently feels like B cuts down latency by half but also uses 10
>>> machines
>>>> for 30 minutes, decreasing the overall throughput significantly. Thus,
>>> when
>>>> many folks want to see their commits tested, resources quickly run out
>>> and
>>>> this in turn significantly increases latency.
>>>> I'd like to have some more predictable build times and sacrifice some
>>>> latency for now.
>>>> 
>>>> It would be interesting to see if we could rearrange the project
>>> execution
>>>> in Maven, such that fast projects are executed first. E2E tests should be
>>>> executed last, which they are somewhat, because of the project
>>> dependencies.
>>>> Of course, I'm very interested to improve the overall build experience by
>>>> exploring other options to Maven.
>>>> 
>>>> Best,
>>>> 
>>>> Arvid
>>>> 
>>>> On Wed, Dec 11, 2019 at 2:32 PM Robert Metzger <rmetz...@apache.org>
>>> wrote:
>>>>> Hey devs,
>>>>> 
>>>>> I need your opinion on something: As part of our migration from Travis
>>> to
>>>>> Azure, I'm revisiting the build system of Flink. I currently see two
>>>>> different ways of proceeding, and I would like to know your opinion on
>>> the
>>>>> two options.
>>>>> 
>>>>> A) We build and test Flink in one "mvn clean verify" call on the CI
>>> system.
>>>>> B) We migrate the two staged build of one compile and N test jobs to
>>> Azure.
>>>>> Option A) is what we are currently running as part of testing the
>>>>> Azure-based system.
>>>>> 
>>>>> Pro/Cons for A)
>>>>> + for "apache/flink" pushes and pull requests, the big testing machines
>>>>> need 1:30 hours to complete (this might go up for a few minutes because
>>> the
>>>>> python tests, and some auxiliary tests are not executed yet)
>>>>> + Our build will be easier to maintain and understand, because we rely
>>> on
>>>>> fewer scripts
>>>>> - builds on Flink forks, using the free Azure plan currently take 3:30
>>>>> hours to complete.
>>>>> 
>>>>> Pro/Cons for B)
>>>>> + builds on Flink forks using the free Azure plan take 1:20 hours,
>>>>> + Builds take 1:20 hours on the big testing machines
>>>>> - maintenance and complexity of the build scripts
>>>>> - the build times are a lot less predictable, because they depend on the
>>>>> availability of workers. For the free plan builds, they are currently
>>> fast,
>>>>> because the test stage has 10 jobs, and Azure offers 10 parallel
>>> workers.
>>>>> We currently only have a total of 8 big machines, so there will always
>>> be
>>>>> some queueing. In practice, for the "apache/flink" repo, build times
>>> will
>>>>> be less favorable, because of the scheduling.
>>>>> 
>>>>> 
>>>>> In my opinion, the question is mostly: Are you okay to wait 3.5 hours
>>> for a
>>>>> build to finish on your private CI, in favor of a less complex build
>>>>> system?
>>>>> Ideally, we'll be able to reduce these 3.5 hours by using a more modern
>>>>> build tool ("gradle") in the future.
>>>>> 
>>>>> I'm happy to hear your thoughts!
>>>>> 
>>>>> Best,
>>>>> Robert
>>>>> 
>>> 
> 

Reply via email to