"flink-core" means the main repository, not the "flink-core" module.
When doing a release, we need to build the flink main code first, because the flink-libraries depend on that. Once the "flink-libraries" are build, we need to run the main build again (at least the flink-dist module), so that it is pulling the artifacts from the flink-libraries to put them into the opt/ folder of the final artifact. On Wed, Mar 15, 2017 at 4:44 PM, Till Rohrmann <trohrm...@apache.org> wrote: > I'm ok with point 3. > > Concerning point 8: Why do we have to build flink-core twice after having > it built as a dependency for flink-libraries? This seems wrong to me. > > Cheers, > Till > > On Wed, Mar 15, 2017 at 4:23 PM, Robert Metzger <rmetz...@apache.org> > wrote: > > > Thank you. Running on AWS is a good idea! > > Let me know if you (or anybody else) wants to help me with the > > infrastructure work! Any help is much appreciated (as I've said before, I > > don't really have time for doing this, but it has to be done :) ) > > > > I'm against creating two new repositories. I fear that this introduces > too > > much complexity and too many repositories. > > "flink" and "flink-libraries" are hopefully enough to get the build time > > significantly down. > > We can also consider putting the connectors into the "flink-libraries" > repo > > if we need to further reduce the build time. > > > > We should probably move "flink-table" of out "flink-libraries" if we want > > to keep "flink-table" in the main repo. (This would eliminate the > > "flink-libraries" module from main. > > > > Also, I agree that "flink-statebackend-rocksdb" is not correctly placed > in > > contrib anymore. > > > > > > On Wed, Mar 15, 2017 at 4:07 PM, Greg Hogan <c...@greghogan.com> wrote: > > > > > Robert, appreciate your kickstarting this task. > > > > > > We should compare the verification time with and without the listed > > > modules. I’ll try to run this by tomorrow on AWS and on Travis. > > > > > > Should we maintain separate repos for flink-contrib and > flink-libraries? > > > Are you intending that we move flink-table out of flink-libraries (and > > > perhaps flink-statebackend-rocksdb out of flink-contrib)? > > > > > > Greg > > > > > > > > > > On Mar 15, 2017, at 9:55 AM, Robert Metzger <rmetz...@apache.org> > > wrote: > > > > > > > > Thank you for looking into this Till. > > > > > > > > I think we then have to split the repositories. > > > > My main motivation for doing this is that it seems to be the only > > > feasible > > > > way of scaling the community to allow more committers working on the > > > > libraries. > > > > > > > > I'll take care of getting things started. > > > > > > > > As the next steps I propose to: > > > > 1. Ask INFRA to rename https://git-wip-us.apache.org/ > > repos/asf?p=flink- > > > > connectors.git;a=summary to "flink-libraries" > > > > 2. Ask INFRA to set up GitHub and travis integration for > > > "flink-libraries" > > > > 3. Put the code of "flink-ml", "flink-gelly", "flink-python", > > > "flink-cep", > > > > "flink-scala-shell", "flink-storm" into the new repository. (I > decided > > > > against moving flink-contrib there, because rocksdb is in the contrib > > > > module, for flink-table, I'm undecided, but I kept it in the main > repo > > > > because its probably going to interact more with the core code in the > > > > future) > > > > I try to preserve the history of those modules when splitting them > into > > > the > > > > new repo > > > > 4. I'll close all pull requests against those modules in the main > repo. > > > > 5. I'll set up a minimal documentation page for the library > repository, > > > > similar to the main documentation. > > > > 6. I'll update the documentation build process to build both > > > documentations > > > > & link them to each other > > > > 7. I'll update the nightly deployment process to include both > > > repositories > > > > 8. I'll update the release script to create the Flink release out of > > both > > > > repositories. In order to put the libraries into the opt/ dir of the > > > > release, I'll need to change the build of "flink-dist" so that it > first > > > > builds flink core, then the libraries and then the core again with > the > > > > libraries as an additional dependency. > > > > > > > > The main question for the community is: do you agree with point 3 ? > > Would > > > > you like to include more or less? > > > > > > > > I'll start with 1. and 2. tomorrow morning. > > > > > > > > > > > > > > > > On Wed, Mar 15, 2017 at 1:48 PM, Till Rohrmann <trohrm...@apache.org > > > > > wrote: > > > > > > > >> In theory we could have a merging bot which solves the problem of > the > > > >> "commit window". Once the PR passes all tests and has enough +1s, > the > > > bot > > > >> could do the merging and, thus, it effectively linearizes the merge > > > >> process. > > > >> > > > >> I think the second point is actually a disadvantage because there is > > not > > > >> such an immediate incentive/pressure to fix the broken module if it > > > lives > > > >> in a separate repository. Furthermore, breaking API changes in the > > core > > > >> will most likely go unnoticed for some time in other modules which > are > > > not > > > >> developed so actively. In the worst case these things will only be > > > noticed > > > >> when we try to make a release. > > > >> > > > >> But I also agree that we are not Google and we don't have the > > > capacities to > > > >> maintain such a smooth a build process that we can keep all the code > > in > > > a > > > >> single repository. > > > >> > > > >> I looked a bit into Gradle and as far as I can tell it offers some > > nice > > > >> features wrt incrementally building projects. This would be > beneficial > > > for > > > >> local development but it would not solve our build time problems on > > > Travis. > > > >> Gradle intends to introduce a task result cache which allows to > reuse > > > >> results across builds. This could help when building on Travis, > > > however, it > > > >> is not yet fully implemented. Moreover, migrating from Maven to > Gradle > > > >> won't come for free (there's simply no free lunch out there) and we > > > might > > > >> risk to introduce new bugs. Therefore, I would vote to split the > > > repository > > > >> in order to mitigate our current problems with Travis and the build > > > time in > > > >> general. Whether to use a different build system or not can then be > > > >> discussed as an orthogonal question. > > > >> > > > >> Cheers, > > > >> Till > > > >> > > > >> On Tue, Mar 14, 2017 at 8:05 PM, Stephan Ewen <se...@apache.org> > > wrote: > > > >> > > > >>> Some other thoughts on how repository split would help. I am not > sure > > > for > > > >>> all of them, so please comment: > > > >>> > > > >>> - There is less competition for a "commit window". It happens a > lot > > > >>> already that you run all tests and want to commit, but there was a > > > commit > > > >>> in the meantime. You rebase, need to re-test, again commit in the > > > >> meantime. > > > >>> For a "linear" commit history, this may become a bottleneck > > > >> eventually > > > >>> as well. > > > >>> > > > >>> - There is less risk of broken master. If one repository/modules > > > breaks > > > >>> its master, the others can still continue. > > > >>> > > > >>> Stephan > > > >>> > > > >>> > > > >>> On Fri, Mar 10, 2017 at 12:20 PM, Till Rohrmann < > > trohrm...@apache.org> > > > >>> wrote: > > > >>> > > > >>>> Thanks for all your input. In order to wrap the discussion up I'd > > like > > > >> to > > > >>>> summarize the mentioned points: > > > >>>> > > > >>>> The problem of increasing build times and complexity of the > project > > > has > > > >>>> been acknowledged. Ideally we would have everything in one > > repository > > > >>> using > > > >>>> an incremental build tool. Since Maven does not properly support > > this > > > >> we > > > >>>> would have to switch our build tool to something like Gradle, for > > > >>> example. > > > >>>> > > > >>>> Another option is introducing build profiles for different sets of > > > >>> modules > > > >>>> as well as separating integration and unit tests. The third > > > alternative > > > >>>> would be creating sub-projects with their own repositories. I > > actually > > > >>>> think that these two proposal are not necessarily exclusive and it > > > >> would > > > >>>> also make sense to have a separation between unit and integration > > > tests > > > >>> if > > > >>>> we split the respository. > > > >>>> > > > >>>> The overall consensus seems to be that we don't want to split the > > > >>> community > > > >>>> and want to keep everything under the same umbrella. I think this > is > > > >> the > > > >>>> right way to go, because otherwise some parts of the project could > > > >> become > > > >>>> second class citizens. Given that and that we continue using > Maven, > > I > > > >>> still > > > >>>> think that creating sub-projects for the libraries, for example, > > could > > > >> be > > > >>>> beneficial. A split could reduce the project's complexity and make > > it > > > >>>> potentially easier for libraries to get actively developed. The > main > > > >>>> concern is setting up the build infrastructure to aggregate docs > > from > > > >>>> multiple repositories and making them publicly available. > > > >>>> > > > >>>> Since I started this thread and I would really like to see Flink's > > ML > > > >>>> library being revived again, I'd volunteer investigating first > > whether > > > >> it > > > >>>> is doable establishing a proper incremental build for Flink. If > that > > > >>> should > > > >>>> not be possible, I will look into splitting the repository, first > > only > > > >>> for > > > >>>> the libraries. I'll share my results with the community once I'm > > done > > > >>> with > > > >>>> the investigation. > > > >>>> > > > >>>> Cheers, > > > >>>> Till > > > >>>> > > > >>>> On Fri, Feb 24, 2017 at 3:50 PM, Robert Metzger < > > rmetz...@apache.org> > > > >>>> wrote: > > > >>>> > > > >>>>> @Jin Mingjian: You can not use the paid travis version for open > > > >> source > > > >>>>> projects. It only works for private repositories (at least back > > then > > > >>> when > > > >>>>> we've asked them about that). > > > >>>>> > > > >>>>> @Stephan: I don't think that incremental builds will be available > > > >> with > > > >>>>> Maven anytime soon. > > > >>>>> > > > >>>>> I agree that we need to fix the build time issue on Travis. I've > > > >>> recently > > > >>>>> pushed a commit to use now three instead of two test groups. > > > >>>>> But I don't think that this is feasible long-term solution. > > > >>>>> > > > >>>>> If this discussion is only about reducing the build and test > time, > > > >>>>> introducing build profiles for different components as Aljoscha > > > >>> suggested > > > >>>>> would solve the problem Till mentioned. > > > >>>>> Also, if we decide that travis is not a good tool anymore for the > > > >>>> testing, > > > >>>>> I guess we can find a different solution. There are now > competitors > > > >> to > > > >>>>> Travis that might be willing to offer a paid plan for an open > > source > > > >>>>> project, or we set up our own infra on a server sponsored by one > of > > > >> the > > > >>>>> contributing companies. > > > >>>>> If we want to solve "community issues" with the change as well, > > then > > > >> I > > > >>>>> think its work the effort of splitting up Flink into different > > > >>>>> repositories. > > > >>>>> > > > >>>>> Splitting up repositories is not a trivial task in my opinion. As > > > >>> others > > > >>>>> have mentioned before, we need to consider the following things: > > > >>>>> - How are we doing to build the documentation? Ideally every repo > > > >>> should > > > >>>>> contain its docs, so we would need to pull them together when > > > >> building > > > >>>> the > > > >>>>> main docs. > > > >>>>> - How do organize the dependencies? If we have library repository > > > >>> depend > > > >>>> on > > > >>>>> snapshot Flink versions, we need to make sure that the snapshot > > > >>>> deployment > > > >>>>> always works. This also means that people working on a library > > > >>> repository > > > >>>>> will pull from snapshot OR need to build first locally. > > > >>>>> - We need to update the release scripts > > > >>>>> > > > >>>>> If we commit to do these changes, we need to assign at least one > > > >>>> committer > > > >>>>> (yes, in this case we need somebody who can commit, for example > for > > > >>>>> updating the buildbot stuff) who volunteers to do the change. > > > >>>>> I've done a lot of infrastructure work in the past, but I'm > > currently > > > >>>>> pretty booked with many other things, so I don't realistically > see > > > >>> myself > > > >>>>> doing that. Max who used to work on these things is taking some > > time > > > >>> off. > > > >>>>> I think we need, best case 3 days for the change, worst case 5 > > days. > > > >>> The > > > >>>>> problem is that there are no "unit tests" for the infra stuff, so > > > >> many > > > >>>>> things are "trial and error" (like Apache's buildbot, our release > > > >>>> scripts, > > > >>>>> the doc scripts, maven stuff, nightly builds). > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> On Thu, Feb 23, 2017 at 1:33 PM, Stephan Ewen <se...@apache.org> > > > >>> wrote: > > > >>>>> > > > >>>>>> If we can get a incremental builds to work, that would actually > be > > > >>> the > > > >>>>>> preferred solution in my opinion. > > > >>>>>> > > > >>>>>> Many companies have invested heavily in making a "single > > > >> repository" > > > >>>> code > > > >>>>>> base work, because it has the advantage of not having to > > > >>> update/publish > > > >>>>>> several repositories first. > > > >>>>>> However, the strong prerequisite for that is an incremental > build > > > >>>> system > > > >>>>>> that builds only (fine grained) what it has to build. I am not > > sure > > > >>> how > > > >>>>> we > > > >>>>>> could make that work > > > >>>>>> with Maven and Travis... > > > >>>>>> > > > >>>>>> On Wed, Feb 22, 2017 at 10:42 PM, Greg Hogan < > c...@greghogan.com> > > > >>>> wrote: > > > >>>>>> > > > >>>>>>> An additional option for reducing time to build and test is > > > >>> parallel > > > >>>>>>> execution. This would help users more than on TravisCI since > > > >> we're > > > >>>>>>> generally running on multi-core machines rather than VM slices. > > > >>>>>>> > > > >>>>>>> Is the idea that each user would only check out the modules > that > > > >> he > > > >>>> or > > > >>>>>> she > > > >>>>>>> is developing with? For example, if a developer is not working > on > > > >>>>>>> flink-mesos or flink-yarn then the "flink-deploy" module would > > > >> not > > > >>> be > > > >>>>>> clone > > > >>>>>>> to their filesystem? > > > >>>>>>> > > > >>>>>>> We can run a TravisCI nightly build on each repo to validate > > > >>> against > > > >>>>> API > > > >>>>>>> changes. > > > >>>>>>> > > > >>>>>>> Greg > > > >>>>>>> > > > >>>>>>> On Wed, Feb 22, 2017 at 12:24 PM, Fabian Hueske < > > > >> fhue...@gmail.com > > > >>>> > > > >>>>>> wrote: > > > >>>>>>> > > > >>>>>>>> Hi everybody, > > > >>>>>>>> > > > >>>>>>>> I think this should be a discussion about the benefits and > > > >>>> drawbacks > > > >>>>> of > > > >>>>>>>> separating the code into distinct repositories from a > > > >> development > > > >>>>> point > > > >>>>>>> of > > > >>>>>>>> view. > > > >>>>>>>> So I agree with Stephan that we should not divide the > community > > > >>> by > > > >>>>>>> creating > > > >>>>>>>> separate groups of committers. > > > >>>>>>>> Also the discussion about independent releases is not be > > > >> strictly > > > >>>>>> related > > > >>>>>>>> to the decision, IMO. > > > >>>>>>>> > > > >>>>>>>> I see a few pros and cons for splitting the code base into > > > >>> separate > > > >>>>>>>> repositories which (I think) haven't been mentioned before: > > > >>>>>>>> pros: > > > >>>>>>>> - IDE setup will be leaner. It is not necessary to compile the > > > >>>> whole > > > >>>>>> code > > > >>>>>>>> base to run a test after switching a branch. > > > >>>>>>>> cons: > > > >>>>>>>> - developing libraries features that require changes in the > > > >> core > > > >>> / > > > >>>>> APIs > > > >>>>>>>> become more time consuming due to back-and-forth between code > > > >>>> bases. > > > >>>>>>>> However, I think this is not very often the case. > > > >>>>>>>> > > > >>>>>>>> Aljoscha has good points as well. Many of the build issues > > > >> could > > > >>> be > > > >>>>>>> solved > > > >>>>>>>> by different build profiles and configurations. > > > >>>>>>>> > > > >>>>>>>> Best, Fabian > > > >>>>>>>> > > > >>>>>>>> 2017-02-22 14:59 GMT+01:00 Gábor Hermann < > > > >> m...@gaborhermann.com > > > >>>> : > > > >>>>>>>> > > > >>>>>>>>> @Stephan: > > > >>>>>>>>> > > > >>>>>>>>> Although I tried to raise some issues about splitting > > > >>> committers, > > > >>>>> I'm > > > >>>>>>>>> still strongly in favor of some kind of restructuring. We > > > >> just > > > >>>> have > > > >>>>>> to > > > >>>>>>> be > > > >>>>>>>>> conscious about the disadvantages. > > > >>>>>>>>> > > > >>>>>>>>> Not splitting the committers could leave the libraries in the > > > >>>> same > > > >>>>>>>>> stalling status, described by Till. Of course, dedicating > > > >>> current > > > >>>>>>>>> committers as shepherds of the libraries could easily resolve > > > >>> the > > > >>>>>>> issue. > > > >>>>>>>>> But that requires time from current committers. It seems like > > > >>>>>>> trade-offs > > > >>>>>>>>> between code quality, speed of development, and committer > > > >>>> efforts. > > > >>>>>>>>> > > > >>>>>>>>> From what I see in the discussion about ML, there are many > > > >>> people > > > >>>>>>> willing > > > >>>>>>>>> to contribute as well as production use-cases. This means we > > > >>>> could > > > >>>>>> and > > > >>>>>>>>> should move forward. However, the development speed is > > > >>>>> significantly > > > >>>>>>>> slowed > > > >>>>>>>>> down by stalling PRs. The proposal for contributors helping > > > >> the > > > >>>>>> review > > > >>>>>>>>> process did not really work out so far. In my opinion, either > > > >>>> code > > > >>>>>>>> quality > > > >>>>>>>>> (by more easily accepting new committers) or some committer > > > >>> time > > > >>>>>>>>> (reviewing/merging) should be sacrificed to move forward. As > > > >>> Till > > > >>>>> has > > > >>>>>>>>> indicated, it would be shameful if we let this contribution > > > >>>> effort > > > >>>>>> die. > > > >>>>>>>>> > > > >>>>>>>>> Cheers, > > > >>>>>>>>> Gabor > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >>> > > > >> > > > > > > > > >