Some other thoughts on how repository split would help. I am not sure for all of them, so please comment:
- There is less competition for a "commit window". It happens a lot already that you run all tests and want to commit, but there was a commit in the meantime. You rebase, need to re-test, again commit in the meantime. For a "linear" commit history, this may become a bottleneck eventually as well. - There is less risk of broken master. If one repository/modules breaks its master, the others can still continue. Stephan On Fri, Mar 10, 2017 at 12:20 PM, Till Rohrmann <trohrm...@apache.org> wrote: > Thanks for all your input. In order to wrap the discussion up I'd like to > summarize the mentioned points: > > The problem of increasing build times and complexity of the project has > been acknowledged. Ideally we would have everything in one repository using > an incremental build tool. Since Maven does not properly support this we > would have to switch our build tool to something like Gradle, for example. > > Another option is introducing build profiles for different sets of modules > as well as separating integration and unit tests. The third alternative > would be creating sub-projects with their own repositories. I actually > think that these two proposal are not necessarily exclusive and it would > also make sense to have a separation between unit and integration tests if > we split the respository. > > The overall consensus seems to be that we don't want to split the community > and want to keep everything under the same umbrella. I think this is the > right way to go, because otherwise some parts of the project could become > second class citizens. Given that and that we continue using Maven, I still > think that creating sub-projects for the libraries, for example, could be > beneficial. A split could reduce the project's complexity and make it > potentially easier for libraries to get actively developed. The main > concern is setting up the build infrastructure to aggregate docs from > multiple repositories and making them publicly available. > > Since I started this thread and I would really like to see Flink's ML > library being revived again, I'd volunteer investigating first whether it > is doable establishing a proper incremental build for Flink. If that should > not be possible, I will look into splitting the repository, first only for > the libraries. I'll share my results with the community once I'm done with > the investigation. > > Cheers, > Till > > On Fri, Feb 24, 2017 at 3:50 PM, Robert Metzger <rmetz...@apache.org> > wrote: > > > @Jin Mingjian: You can not use the paid travis version for open source > > projects. It only works for private repositories (at least back then when > > we've asked them about that). > > > > @Stephan: I don't think that incremental builds will be available with > > Maven anytime soon. > > > > I agree that we need to fix the build time issue on Travis. I've recently > > pushed a commit to use now three instead of two test groups. > > But I don't think that this is feasible long-term solution. > > > > If this discussion is only about reducing the build and test time, > > introducing build profiles for different components as Aljoscha suggested > > would solve the problem Till mentioned. > > Also, if we decide that travis is not a good tool anymore for the > testing, > > I guess we can find a different solution. There are now competitors to > > Travis that might be willing to offer a paid plan for an open source > > project, or we set up our own infra on a server sponsored by one of the > > contributing companies. > > If we want to solve "community issues" with the change as well, then I > > think its work the effort of splitting up Flink into different > > repositories. > > > > Splitting up repositories is not a trivial task in my opinion. As others > > have mentioned before, we need to consider the following things: > > - How are we doing to build the documentation? Ideally every repo should > > contain its docs, so we would need to pull them together when building > the > > main docs. > > - How do organize the dependencies? If we have library repository depend > on > > snapshot Flink versions, we need to make sure that the snapshot > deployment > > always works. This also means that people working on a library repository > > will pull from snapshot OR need to build first locally. > > - We need to update the release scripts > > > > If we commit to do these changes, we need to assign at least one > committer > > (yes, in this case we need somebody who can commit, for example for > > updating the buildbot stuff) who volunteers to do the change. > > I've done a lot of infrastructure work in the past, but I'm currently > > pretty booked with many other things, so I don't realistically see myself > > doing that. Max who used to work on these things is taking some time off. > > I think we need, best case 3 days for the change, worst case 5 days. The > > problem is that there are no "unit tests" for the infra stuff, so many > > things are "trial and error" (like Apache's buildbot, our release > scripts, > > the doc scripts, maven stuff, nightly builds). > > > > > > > > On Thu, Feb 23, 2017 at 1:33 PM, Stephan Ewen <se...@apache.org> wrote: > > > > > If we can get a incremental builds to work, that would actually be the > > > preferred solution in my opinion. > > > > > > Many companies have invested heavily in making a "single repository" > code > > > base work, because it has the advantage of not having to update/publish > > > several repositories first. > > > However, the strong prerequisite for that is an incremental build > system > > > that builds only (fine grained) what it has to build. I am not sure how > > we > > > could make that work > > > with Maven and Travis... > > > > > > On Wed, Feb 22, 2017 at 10:42 PM, Greg Hogan <c...@greghogan.com> > wrote: > > > > > > > An additional option for reducing time to build and test is parallel > > > > execution. This would help users more than on TravisCI since we're > > > > generally running on multi-core machines rather than VM slices. > > > > > > > > Is the idea that each user would only check out the modules that he > or > > > she > > > > is developing with? For example, if a developer is not working on > > > > flink-mesos or flink-yarn then the "flink-deploy" module would not be > > > clone > > > > to their filesystem? > > > > > > > > We can run a TravisCI nightly build on each repo to validate against > > API > > > > changes. > > > > > > > > Greg > > > > > > > > On Wed, Feb 22, 2017 at 12:24 PM, Fabian Hueske <fhue...@gmail.com> > > > wrote: > > > > > > > > > Hi everybody, > > > > > > > > > > I think this should be a discussion about the benefits and > drawbacks > > of > > > > > separating the code into distinct repositories from a development > > point > > > > of > > > > > view. > > > > > So I agree with Stephan that we should not divide the community by > > > > creating > > > > > separate groups of committers. > > > > > Also the discussion about independent releases is not be strictly > > > related > > > > > to the decision, IMO. > > > > > > > > > > I see a few pros and cons for splitting the code base into separate > > > > > repositories which (I think) haven't been mentioned before: > > > > > pros: > > > > > - IDE setup will be leaner. It is not necessary to compile the > whole > > > code > > > > > base to run a test after switching a branch. > > > > > cons: > > > > > - developing libraries features that require changes in the core / > > APIs > > > > > become more time consuming due to back-and-forth between code > bases. > > > > > However, I think this is not very often the case. > > > > > > > > > > Aljoscha has good points as well. Many of the build issues could be > > > > solved > > > > > by different build profiles and configurations. > > > > > > > > > > Best, Fabian > > > > > > > > > > 2017-02-22 14:59 GMT+01:00 Gábor Hermann <m...@gaborhermann.com>: > > > > > > > > > > > @Stephan: > > > > > > > > > > > > Although I tried to raise some issues about splitting committers, > > I'm > > > > > > still strongly in favor of some kind of restructuring. We just > have > > > to > > > > be > > > > > > conscious about the disadvantages. > > > > > > > > > > > > Not splitting the committers could leave the libraries in the > same > > > > > > stalling status, described by Till. Of course, dedicating current > > > > > > committers as shepherds of the libraries could easily resolve the > > > > issue. > > > > > > But that requires time from current committers. It seems like > > > > trade-offs > > > > > > between code quality, speed of development, and committer > efforts. > > > > > > > > > > > > From what I see in the discussion about ML, there are many people > > > > willing > > > > > > to contribute as well as production use-cases. This means we > could > > > and > > > > > > should move forward. However, the development speed is > > significantly > > > > > slowed > > > > > > down by stalling PRs. The proposal for contributors helping the > > > review > > > > > > process did not really work out so far. In my opinion, either > code > > > > > quality > > > > > > (by more easily accepting new committers) or some committer time > > > > > > (reviewing/merging) should be sacrificed to move forward. As Till > > has > > > > > > indicated, it would be shameful if we let this contribution > effort > > > die. > > > > > > > > > > > > Cheers, > > > > > > Gabor > > > > > > > > > > > > > > > > > > > > > > > > > > >