Hi Flink community, I'd like to revive a discussion about Flink's build time and project structure which we already had in some other mailing thread [1] and which we wanted do after the 1.2 release.
Recently, we can see that Flink is exceeding more and more often Travis maximum build time of 50 minutes. This leads to failing builds as it can be seen here [2]. Almost 50 % of my last builds on Travis failed because of the 50 minutes time limit. The excess of the time limit not only prevents some tests (especially the yarn tests) to be executed regularly but it also undermines the people's trust into CI. We've seen in the past that when we had some flakey tests that there was an acceptance to merge PRs even though Travis failed because the failing tests were "always" unrelated. But how sure can you be about that? Having a properly working and reliable CI system is imo crucial for guaranteeing Flink's high quality standard. In the past we've split Flink's tests into two groups which are executed separately in order to cope with increasing build times. This could again be a solution to the problem. However, there is also another problem of slowly increasing build times for Flink. On my machine building Flink with deactivated tests takes about 10 minutes. That's mainly because Flink has grown quite big containing now not only the runtime and apis but also several libraries and a contribution module. Stephan proposed to split up the repository into the following set - flink-core (core, apis, runtime, clients) - flink-libraries (gelly, ml, cep, table, scala shell, python) - flink-connectors - flink-contrib in order to make the project better maintainable and decreasing build as well as test times. Of course such a split would raise the question how and how often the individual modules are released. Will they follow an independent release cycle or will they be synched? Moreover, the problem of API stability across module boundaries will arise. Changing things in the core repository might break things in a library repository and since they are independent this break might go unnoticed for some time. Stephan's proposal also includes that the new repositories will be governed by the same PMC. A little bit off-topic but also somewhat related is how we handle the load of outside contributions for modules where we don't have many committers present. Good examples (actually they are bad examples for community work) are the ML and the CEP library. These libraries started promising and attracted outside contributions. However, due to a lack of committers who could spend time on these libraries, their development stalled and made many contributors turn away from it. Maybe such a split makes things easier wrt to making more contributors committers. Moreover, an independent release cycle for volatile projects might help increasing adoption, because bug-fixes can be delivered more frequently. Recently, I've seen an increased interest in and really good discussions about FlinkML's future [3]. I really would not like to repeat the same mistakes and let this effort die again by simply being not responsive to contributors who would like to get involved. The only way I see this happening is to add more committers to the ML library. And maybe we feel more comfortable adding new committers faster to repos which are not Flink core. I know we should first discuss the former problem and find a conclusion there. But I mentioned the outside contributors problem as well because it is an argument for a repo split. [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Travis-CI-tt14478.html [2] https://travis-ci.org/tillrohrmann/flink/builds/203479275 [3] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-ML-roadmap-tt16040.html Cheers, Till