The repo splitting is the result of the grown code base. So this will happen finally. The problem is when and how.
when: the time point seems not bad. how: is the schema good? I assume we can not add committer per project(or the committer is just a logic concept?). So just splitting into flink-libraries does not solve the problem of "add more committers to the ML library". More fine granularity may be good for this but possibly be not an immediate solution to Travis timeout. Travis building integration is very important for (github) contributions from the engineering aspect. Immediate workaround is much nice to have: 1. splitting tests into parallel jobs as Till mentioned; 2. prepared docker image to speed up the maven download? 3. update to Travis.com(commercial) to get 120 minutes timeout[1] immediately or some company likes to sponsor some dedicated system for this? regards, Jin [1] https://docs.travis-ci.com/user/customizing-the-build#Build-Timeouts On Wed, Feb 22, 2017 at 12:44 AM, Theodore Vasiloudis < theodoros.vasilou...@gmail.com> wrote: > Hello all, > > From a library developer POV I think splitting up the project will have > more advantages than disadvantages. > Api breaking things should move to be the responsibility of library > developers, and with automated tests > they shouldn't be too hard to catch. > > I think I'm more fin favor of synced releases to not confuse users. If we > are going to be presenting the Flink stack > as an integrated product, as a user I would expect everything to be under > one release schedule and not > have to worry about different versions of different parts of the stack. > > If we were to split how does that work under the ASF? Is it possible to > have someone be a committer for > a library but not for the core? > > Regards, > Theodore > > > On Tue, Feb 21, 2017 at 1:44 PM, Till Rohrmann <trohrm...@apache.org> > wrote: > > > Hi Flink community, > > > > I'd like to revive a discussion about Flink's build time and project > > structure which we already had in some other mailing thread [1] and which > > we wanted do after the 1.2 release. > > > > Recently, we can see that Flink is exceeding more and more often Travis > > maximum build time of 50 minutes. This leads to failing builds as it can > be > > seen here [2]. Almost 50 % of my last builds on Travis failed because of > > the 50 minutes time limit. > > > > The excess of the time limit not only prevents some tests (especially the > > yarn tests) to be executed regularly but it also undermines the people's > > trust into CI. We've seen in the past that when we had some flakey tests > > that there was an acceptance to merge PRs even though Travis failed > because > > the failing tests were "always" unrelated. But how sure can you be about > > that? Having a properly working and reliable CI system is imo crucial for > > guaranteeing Flink's high quality standard. > > > > In the past we've split Flink's tests into two groups which are executed > > separately in order to cope with increasing build times. This could again > > be a solution to the problem. > > > > However, there is also another problem of slowly increasing build times > for > > Flink. On my machine building Flink with deactivated tests takes about 10 > > minutes. That's mainly because Flink has grown quite big containing now > not > > only the runtime and apis but also several libraries and a contribution > > module. Stephan proposed to split up the repository into the following > set > > > > - flink-core (core, apis, runtime, clients) > > - flink-libraries (gelly, ml, cep, table, scala shell, python) > > - flink-connectors > > - flink-contrib > > > > in order to make the project better maintainable and decreasing build as > > well as test times. Of course such a split would raise the question how > and > > how often the individual modules are released. Will they follow an > > independent release cycle or will they be synched? Moreover, the problem > of > > API stability across module boundaries will arise. Changing things in the > > core repository might break things in a library repository and since they > > are independent this break might go unnoticed for some time. Stephan's > > proposal also includes that the new repositories will be governed by the > > same PMC. > > > > A little bit off-topic but also somewhat related is how we handle the > load > > of outside contributions for modules where we don't have many committers > > present. Good examples (actually they are bad examples for community > work) > > are the ML and the CEP library. These libraries started promising and > > attracted outside contributions. However, due to a lack of committers who > > could spend time on these libraries, their development stalled and made > > many contributors turn away from it. Maybe such a split makes things > easier > > wrt to making more contributors committers. Moreover, an independent > > release cycle for volatile projects might help increasing adoption, > because > > bug-fixes can be delivered more frequently. > > > > Recently, I've seen an increased interest in and really good discussions > > about FlinkML's future [3]. I really would not like to repeat the same > > mistakes and let this effort die again by simply being not responsive to > > contributors who would like to get involved. The only way I see this > > happening is to add more committers to the ML library. And maybe we feel > > more comfortable adding new committers faster to repos which are not > Flink > > core. > > > > I know we should first discuss the former problem and find a conclusion > > there. But I mentioned the outside contributors problem as well because > it > > is an argument for a repo split. > > > > [1] > > http://apache-flink-mailing-list-archive.1008284.n3. > > nabble.com/Travis-CI-tt14478.html > > [2] https://travis-ci.org/tillrohrmann/flink/builds/203479275 > > [3] > > http://apache-flink-mailing-list-archive.1008284.n3. > > nabble.com/DISCUSS-Flink-ML-roadmap-tt16040.html > > > > Cheers, > > Till > > >