It’s a though question. One the one hand I like less complexity in the build system. But one of the most important things for developers is fast iteration cycles.
So I would prefer the solution that keeps the iteration time low. Best, Aljoscha > On 13. Dec 2019, at 14:41, Chesnay Schepler <ches...@apache.org> wrote: > > It depends on how to define "split"; if you split by module (as we do > currently) you have the same complexity as we have right now; > caching of artifacts and brittle definition of splits. > > But there are other ways to split builds, for example into unit and > integration tests; could also add end-to-end tests to this list. > At that point we're basically talking about multiple parallel builds that are > fully independent. > Let's also remember that caching of the build artifact is only useful when > the compile times are large enough to warrant it; > if we only go with 2 splits in the grand scheme of things the caching > wouldn't even be required. > We added the caching to Travis since at 5+ builds (and the guarantee for this > number to go up) the compilation time was a much larger factor. > > As for the current split setup we have (as in by modules), it isn't just > about faster feedback times; they can also be used to isolate components from > each other. > I know that quite a few people appreciate the kafka/python module being in > it's own split for example. > > On 11/12/2019 16:44, Robert Metzger wrote: >> Some comments on Chesnay's message: >> - Changing the number of splits will not reduce the complexity. >> - One can also use the Flink build machines by opening a PR to the >> "flink-ci/flink" repo, no need to open crappy PRs :) >> - On the number of builds being run: We currently use 4 out of 10 machines >> offered by Alibaba, and we are not yet hitting any limits. In addition to >> that, another big cloud provider has reached out to us, offering build >> capacity. >> >> But generally, I agree that solely relying on the build infrastructure of >> Flink is not a good option. The free Azure builds should provide a >> reasonable experience. >> >> >> On Wed, Dec 11, 2019 at 3:22 PM Chesnay Schepler <ches...@apache.org> wrote: >> >>> Note that for B it's not strictly necessary to maintain the current >>> number of splits; 2 might already be enough to bring contributor builds >>> to a more reasonable level. >>> >>> I don't think that a contributor build taking 3,5h is a viable option; >>> people will start disregarding their own instance and just open a PR >>> without having run the tests, which will naturally mean that PR quality >>> will drop. Committers probably will start working around this and push >>> branches into the flink repo for running tests; we have seen that in the >>> past and see this currently for e2e tests. >>> >>> This will increase the number of builds being run on the Flink machines >>> by quite a bit, obviously affecting throughput and latency.. >>> >>> On 11/12/2019 14:59, Arvid Heise wrote: >>>> Hi Robert, >>>> >>>> thank you very much for raising this issue and improving the build >>> system. >>>> For now, I'd like to stick to a lean solution (= option A). >>>> >>>> While option B can greatly reduce build times, it also has the habit of >>>> clogging up the build machines. Just some arbitrary numbers, but it >>>> currently feels like B cuts down latency by half but also uses 10 >>> machines >>>> for 30 minutes, decreasing the overall throughput significantly. Thus, >>> when >>>> many folks want to see their commits tested, resources quickly run out >>> and >>>> this in turn significantly increases latency. >>>> I'd like to have some more predictable build times and sacrifice some >>>> latency for now. >>>> >>>> It would be interesting to see if we could rearrange the project >>> execution >>>> in Maven, such that fast projects are executed first. E2E tests should be >>>> executed last, which they are somewhat, because of the project >>> dependencies. >>>> Of course, I'm very interested to improve the overall build experience by >>>> exploring other options to Maven. >>>> >>>> Best, >>>> >>>> Arvid >>>> >>>> On Wed, Dec 11, 2019 at 2:32 PM Robert Metzger <rmetz...@apache.org> >>> wrote: >>>>> Hey devs, >>>>> >>>>> I need your opinion on something: As part of our migration from Travis >>> to >>>>> Azure, I'm revisiting the build system of Flink. I currently see two >>>>> different ways of proceeding, and I would like to know your opinion on >>> the >>>>> two options. >>>>> >>>>> A) We build and test Flink in one "mvn clean verify" call on the CI >>> system. >>>>> B) We migrate the two staged build of one compile and N test jobs to >>> Azure. >>>>> Option A) is what we are currently running as part of testing the >>>>> Azure-based system. >>>>> >>>>> Pro/Cons for A) >>>>> + for "apache/flink" pushes and pull requests, the big testing machines >>>>> need 1:30 hours to complete (this might go up for a few minutes because >>> the >>>>> python tests, and some auxiliary tests are not executed yet) >>>>> + Our build will be easier to maintain and understand, because we rely >>> on >>>>> fewer scripts >>>>> - builds on Flink forks, using the free Azure plan currently take 3:30 >>>>> hours to complete. >>>>> >>>>> Pro/Cons for B) >>>>> + builds on Flink forks using the free Azure plan take 1:20 hours, >>>>> + Builds take 1:20 hours on the big testing machines >>>>> - maintenance and complexity of the build scripts >>>>> - the build times are a lot less predictable, because they depend on the >>>>> availability of workers. For the free plan builds, they are currently >>> fast, >>>>> because the test stage has 10 jobs, and Azure offers 10 parallel >>> workers. >>>>> We currently only have a total of 8 big machines, so there will always >>> be >>>>> some queueing. In practice, for the "apache/flink" repo, build times >>> will >>>>> be less favorable, because of the scheduling. >>>>> >>>>> >>>>> In my opinion, the question is mostly: Are you okay to wait 3.5 hours >>> for a >>>>> build to finish on your private CI, in favor of a less complex build >>>>> system? >>>>> Ideally, we'll be able to reduce these 3.5 hours by using a more modern >>>>> build tool ("gradle") in the future. >>>>> >>>>> I'm happy to hear your thoughts! >>>>> >>>>> Best, >>>>> Robert >>>>> >>> >