Hi Flink community,

I'd like to revive a discussion about Flink's build time and project
structure which we already had in some other mailing thread [1] and which
we wanted do after the 1.2 release.

Recently, we can see that Flink is exceeding more and more often Travis
maximum build time of 50 minutes. This leads to failing builds as it can be
seen here [2]. Almost 50 % of my last builds on Travis failed because of
the 50 minutes time limit.

The excess of the time limit not only prevents some tests (especially the
yarn tests) to be executed regularly but it also undermines the people's
trust into CI. We've seen in the past that when we had some flakey tests
that there was an acceptance to merge PRs even though Travis failed because
the failing tests were "always" unrelated. But how sure can you be about
that? Having a properly working and reliable CI system is imo crucial for
guaranteeing Flink's high quality standard.

In the past we've split Flink's tests into two groups which are executed
separately in order to cope with increasing build times. This could again
be a solution to the problem.

However, there is also another problem of slowly increasing build times for
Flink. On my machine building Flink with deactivated tests takes about 10
minutes. That's mainly because Flink has grown quite big containing now not
only the runtime and apis but also several libraries and a contribution
module. Stephan proposed to split up the repository into the following set

  - flink-core (core, apis, runtime, clients)
  - flink-libraries (gelly, ml, cep, table, scala shell, python)
  - flink-connectors
  - flink-contrib

in order to make the project better maintainable and decreasing build as
well as test times. Of course such a split would raise the question how and
how often the individual modules are released. Will they follow an
independent release cycle or will they be synched? Moreover, the problem of
API stability across module boundaries will arise. Changing things in the
core repository might break things in a library repository and since they
are independent this break might go unnoticed for some time. Stephan's
proposal also includes that the new repositories will be governed by the
same PMC.

A little bit off-topic but also somewhat related is how we handle the load
of outside contributions for modules where we don't have many committers
present. Good examples (actually they are bad examples for community work)
are the ML and the CEP library. These libraries started promising and
attracted outside contributions. However, due to a lack of committers who
could spend time on these libraries, their development stalled and made
many contributors turn away from it. Maybe such a split makes things easier
wrt to making more contributors committers. Moreover, an independent
release cycle for volatile projects might help increasing adoption, because
bug-fixes can be delivered more frequently.

Recently, I've seen an increased interest in and really good discussions
about FlinkML's future [3]. I really would not like to repeat the same
mistakes and let this effort die again by simply being not responsive to
contributors who would like to get involved. The only way I see this
happening is to add more committers to the ML library. And maybe we feel
more comfortable adding new committers faster to repos which are not Flink
core.

I know we should first discuss the former problem and find a conclusion
there. But I mentioned the outside contributors problem as well because it
is an argument for a repo split.

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Travis-CI-tt14478.html
[2] https://travis-ci.org/tillrohrmann/flink/builds/203479275
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-ML-roadmap-tt16040.html

Cheers,
Till

Reply via email to