I think it should not be at least the flink-dist but exactly the remaining flink-dist module. Otherwise we do redundant work.
On Wed, Mar 15, 2017 at 5:03 PM, Robert Metzger <rmetz...@apache.org> wrote: > "flink-core" means the main repository, not the "flink-core" module. > > When doing a release, we need to build the flink main code first, because > the flink-libraries depend on that. > Once the "flink-libraries" are build, we need to run the main build again > (at least the flink-dist module), so that it is pulling the artifacts from > the flink-libraries to put them into the opt/ folder of the final artifact. > > > > > On Wed, Mar 15, 2017 at 4:44 PM, Till Rohrmann <trohrm...@apache.org> > wrote: > > > I'm ok with point 3. > > > > Concerning point 8: Why do we have to build flink-core twice after having > > it built as a dependency for flink-libraries? This seems wrong to me. > > > > Cheers, > > Till > > > > On Wed, Mar 15, 2017 at 4:23 PM, Robert Metzger <rmetz...@apache.org> > > wrote: > > > > > Thank you. Running on AWS is a good idea! > > > Let me know if you (or anybody else) wants to help me with the > > > infrastructure work! Any help is much appreciated (as I've said > before, I > > > don't really have time for doing this, but it has to be done :) ) > > > > > > I'm against creating two new repositories. I fear that this introduces > > too > > > much complexity and too many repositories. > > > "flink" and "flink-libraries" are hopefully enough to get the build > time > > > significantly down. > > > We can also consider putting the connectors into the "flink-libraries" > > repo > > > if we need to further reduce the build time. > > > > > > We should probably move "flink-table" of out "flink-libraries" if we > want > > > to keep "flink-table" in the main repo. (This would eliminate the > > > "flink-libraries" module from main. > > > > > > Also, I agree that "flink-statebackend-rocksdb" is not correctly placed > > in > > > contrib anymore. > > > > > > > > > On Wed, Mar 15, 2017 at 4:07 PM, Greg Hogan <c...@greghogan.com> > wrote: > > > > > > > Robert, appreciate your kickstarting this task. > > > > > > > > We should compare the verification time with and without the listed > > > > modules. I’ll try to run this by tomorrow on AWS and on Travis. > > > > > > > > Should we maintain separate repos for flink-contrib and > > flink-libraries? > > > > Are you intending that we move flink-table out of flink-libraries > (and > > > > perhaps flink-statebackend-rocksdb out of flink-contrib)? > > > > > > > > Greg > > > > > > > > > > > > > On Mar 15, 2017, at 9:55 AM, Robert Metzger <rmetz...@apache.org> > > > wrote: > > > > > > > > > > Thank you for looking into this Till. > > > > > > > > > > I think we then have to split the repositories. > > > > > My main motivation for doing this is that it seems to be the only > > > > feasible > > > > > way of scaling the community to allow more committers working on > the > > > > > libraries. > > > > > > > > > > I'll take care of getting things started. > > > > > > > > > > As the next steps I propose to: > > > > > 1. Ask INFRA to rename https://git-wip-us.apache.org/ > > > repos/asf?p=flink- > > > > > connectors.git;a=summary to "flink-libraries" > > > > > 2. Ask INFRA to set up GitHub and travis integration for > > > > "flink-libraries" > > > > > 3. Put the code of "flink-ml", "flink-gelly", "flink-python", > > > > "flink-cep", > > > > > "flink-scala-shell", "flink-storm" into the new repository. (I > > decided > > > > > against moving flink-contrib there, because rocksdb is in the > contrib > > > > > module, for flink-table, I'm undecided, but I kept it in the main > > repo > > > > > because its probably going to interact more with the core code in > the > > > > > future) > > > > > I try to preserve the history of those modules when splitting them > > into > > > > the > > > > > new repo > > > > > 4. I'll close all pull requests against those modules in the main > > repo. > > > > > 5. I'll set up a minimal documentation page for the library > > repository, > > > > > similar to the main documentation. > > > > > 6. I'll update the documentation build process to build both > > > > documentations > > > > > & link them to each other > > > > > 7. I'll update the nightly deployment process to include both > > > > repositories > > > > > 8. I'll update the release script to create the Flink release out > of > > > both > > > > > repositories. In order to put the libraries into the opt/ dir of > the > > > > > release, I'll need to change the build of "flink-dist" so that it > > first > > > > > builds flink core, then the libraries and then the core again with > > the > > > > > libraries as an additional dependency. > > > > > > > > > > The main question for the community is: do you agree with point 3 ? > > > Would > > > > > you like to include more or less? > > > > > > > > > > I'll start with 1. and 2. tomorrow morning. > > > > > > > > > > > > > > > > > > > > On Wed, Mar 15, 2017 at 1:48 PM, Till Rohrmann < > trohrm...@apache.org > > > > > > > wrote: > > > > > > > > > >> In theory we could have a merging bot which solves the problem of > > the > > > > >> "commit window". Once the PR passes all tests and has enough +1s, > > the > > > > bot > > > > >> could do the merging and, thus, it effectively linearizes the > merge > > > > >> process. > > > > >> > > > > >> I think the second point is actually a disadvantage because there > is > > > not > > > > >> such an immediate incentive/pressure to fix the broken module if > it > > > > lives > > > > >> in a separate repository. Furthermore, breaking API changes in the > > > core > > > > >> will most likely go unnoticed for some time in other modules which > > are > > > > not > > > > >> developed so actively. In the worst case these things will only be > > > > noticed > > > > >> when we try to make a release. > > > > >> > > > > >> But I also agree that we are not Google and we don't have the > > > > capacities to > > > > >> maintain such a smooth a build process that we can keep all the > code > > > in > > > > a > > > > >> single repository. > > > > >> > > > > >> I looked a bit into Gradle and as far as I can tell it offers some > > > nice > > > > >> features wrt incrementally building projects. This would be > > beneficial > > > > for > > > > >> local development but it would not solve our build time problems > on > > > > Travis. > > > > >> Gradle intends to introduce a task result cache which allows to > > reuse > > > > >> results across builds. This could help when building on Travis, > > > > however, it > > > > >> is not yet fully implemented. Moreover, migrating from Maven to > > Gradle > > > > >> won't come for free (there's simply no free lunch out there) and > we > > > > might > > > > >> risk to introduce new bugs. Therefore, I would vote to split the > > > > repository > > > > >> in order to mitigate our current problems with Travis and the > build > > > > time in > > > > >> general. Whether to use a different build system or not can then > be > > > > >> discussed as an orthogonal question. > > > > >> > > > > >> Cheers, > > > > >> Till > > > > >> > > > > >> On Tue, Mar 14, 2017 at 8:05 PM, Stephan Ewen <se...@apache.org> > > > wrote: > > > > >> > > > > >>> Some other thoughts on how repository split would help. I am not > > sure > > > > for > > > > >>> all of them, so please comment: > > > > >>> > > > > >>> - There is less competition for a "commit window". It happens a > > lot > > > > >>> already that you run all tests and want to commit, but there was > a > > > > commit > > > > >>> in the meantime. You rebase, need to re-test, again commit in the > > > > >> meantime. > > > > >>> For a "linear" commit history, this may become a bottleneck > > > > >> eventually > > > > >>> as well. > > > > >>> > > > > >>> - There is less risk of broken master. If one repository/modules > > > > breaks > > > > >>> its master, the others can still continue. > > > > >>> > > > > >>> Stephan > > > > >>> > > > > >>> > > > > >>> On Fri, Mar 10, 2017 at 12:20 PM, Till Rohrmann < > > > trohrm...@apache.org> > > > > >>> wrote: > > > > >>> > > > > >>>> Thanks for all your input. In order to wrap the discussion up > I'd > > > like > > > > >> to > > > > >>>> summarize the mentioned points: > > > > >>>> > > > > >>>> The problem of increasing build times and complexity of the > > project > > > > has > > > > >>>> been acknowledged. Ideally we would have everything in one > > > repository > > > > >>> using > > > > >>>> an incremental build tool. Since Maven does not properly support > > > this > > > > >> we > > > > >>>> would have to switch our build tool to something like Gradle, > for > > > > >>> example. > > > > >>>> > > > > >>>> Another option is introducing build profiles for different sets > of > > > > >>> modules > > > > >>>> as well as separating integration and unit tests. The third > > > > alternative > > > > >>>> would be creating sub-projects with their own repositories. I > > > actually > > > > >>>> think that these two proposal are not necessarily exclusive and > it > > > > >> would > > > > >>>> also make sense to have a separation between unit and > integration > > > > tests > > > > >>> if > > > > >>>> we split the respository. > > > > >>>> > > > > >>>> The overall consensus seems to be that we don't want to split > the > > > > >>> community > > > > >>>> and want to keep everything under the same umbrella. I think > this > > is > > > > >> the > > > > >>>> right way to go, because otherwise some parts of the project > could > > > > >> become > > > > >>>> second class citizens. Given that and that we continue using > > Maven, > > > I > > > > >>> still > > > > >>>> think that creating sub-projects for the libraries, for example, > > > could > > > > >> be > > > > >>>> beneficial. A split could reduce the project's complexity and > make > > > it > > > > >>>> potentially easier for libraries to get actively developed. The > > main > > > > >>>> concern is setting up the build infrastructure to aggregate docs > > > from > > > > >>>> multiple repositories and making them publicly available. > > > > >>>> > > > > >>>> Since I started this thread and I would really like to see > Flink's > > > ML > > > > >>>> library being revived again, I'd volunteer investigating first > > > whether > > > > >> it > > > > >>>> is doable establishing a proper incremental build for Flink. If > > that > > > > >>> should > > > > >>>> not be possible, I will look into splitting the repository, > first > > > only > > > > >>> for > > > > >>>> the libraries. I'll share my results with the community once I'm > > > done > > > > >>> with > > > > >>>> the investigation. > > > > >>>> > > > > >>>> Cheers, > > > > >>>> Till > > > > >>>> > > > > >>>> On Fri, Feb 24, 2017 at 3:50 PM, Robert Metzger < > > > rmetz...@apache.org> > > > > >>>> wrote: > > > > >>>> > > > > >>>>> @Jin Mingjian: You can not use the paid travis version for open > > > > >> source > > > > >>>>> projects. It only works for private repositories (at least back > > > then > > > > >>> when > > > > >>>>> we've asked them about that). > > > > >>>>> > > > > >>>>> @Stephan: I don't think that incremental builds will be > available > > > > >> with > > > > >>>>> Maven anytime soon. > > > > >>>>> > > > > >>>>> I agree that we need to fix the build time issue on Travis. > I've > > > > >>> recently > > > > >>>>> pushed a commit to use now three instead of two test groups. > > > > >>>>> But I don't think that this is feasible long-term solution. > > > > >>>>> > > > > >>>>> If this discussion is only about reducing the build and test > > time, > > > > >>>>> introducing build profiles for different components as Aljoscha > > > > >>> suggested > > > > >>>>> would solve the problem Till mentioned. > > > > >>>>> Also, if we decide that travis is not a good tool anymore for > the > > > > >>>> testing, > > > > >>>>> I guess we can find a different solution. There are now > > competitors > > > > >> to > > > > >>>>> Travis that might be willing to offer a paid plan for an open > > > source > > > > >>>>> project, or we set up our own infra on a server sponsored by > one > > of > > > > >> the > > > > >>>>> contributing companies. > > > > >>>>> If we want to solve "community issues" with the change as well, > > > then > > > > >> I > > > > >>>>> think its work the effort of splitting up Flink into different > > > > >>>>> repositories. > > > > >>>>> > > > > >>>>> Splitting up repositories is not a trivial task in my opinion. > As > > > > >>> others > > > > >>>>> have mentioned before, we need to consider the following > things: > > > > >>>>> - How are we doing to build the documentation? Ideally every > repo > > > > >>> should > > > > >>>>> contain its docs, so we would need to pull them together when > > > > >> building > > > > >>>> the > > > > >>>>> main docs. > > > > >>>>> - How do organize the dependencies? If we have library > repository > > > > >>> depend > > > > >>>> on > > > > >>>>> snapshot Flink versions, we need to make sure that the snapshot > > > > >>>> deployment > > > > >>>>> always works. This also means that people working on a library > > > > >>> repository > > > > >>>>> will pull from snapshot OR need to build first locally. > > > > >>>>> - We need to update the release scripts > > > > >>>>> > > > > >>>>> If we commit to do these changes, we need to assign at least > one > > > > >>>> committer > > > > >>>>> (yes, in this case we need somebody who can commit, for example > > for > > > > >>>>> updating the buildbot stuff) who volunteers to do the change. > > > > >>>>> I've done a lot of infrastructure work in the past, but I'm > > > currently > > > > >>>>> pretty booked with many other things, so I don't realistically > > see > > > > >>> myself > > > > >>>>> doing that. Max who used to work on these things is taking some > > > time > > > > >>> off. > > > > >>>>> I think we need, best case 3 days for the change, worst case 5 > > > days. > > > > >>> The > > > > >>>>> problem is that there are no "unit tests" for the infra stuff, > so > > > > >> many > > > > >>>>> things are "trial and error" (like Apache's buildbot, our > release > > > > >>>> scripts, > > > > >>>>> the doc scripts, maven stuff, nightly builds). > > > > >>>>> > > > > >>>>> > > > > >>>>> > > > > >>>>> On Thu, Feb 23, 2017 at 1:33 PM, Stephan Ewen < > se...@apache.org> > > > > >>> wrote: > > > > >>>>> > > > > >>>>>> If we can get a incremental builds to work, that would > actually > > be > > > > >>> the > > > > >>>>>> preferred solution in my opinion. > > > > >>>>>> > > > > >>>>>> Many companies have invested heavily in making a "single > > > > >> repository" > > > > >>>> code > > > > >>>>>> base work, because it has the advantage of not having to > > > > >>> update/publish > > > > >>>>>> several repositories first. > > > > >>>>>> However, the strong prerequisite for that is an incremental > > build > > > > >>>> system > > > > >>>>>> that builds only (fine grained) what it has to build. I am not > > > sure > > > > >>> how > > > > >>>>> we > > > > >>>>>> could make that work > > > > >>>>>> with Maven and Travis... > > > > >>>>>> > > > > >>>>>> On Wed, Feb 22, 2017 at 10:42 PM, Greg Hogan < > > c...@greghogan.com> > > > > >>>> wrote: > > > > >>>>>> > > > > >>>>>>> An additional option for reducing time to build and test is > > > > >>> parallel > > > > >>>>>>> execution. This would help users more than on TravisCI since > > > > >> we're > > > > >>>>>>> generally running on multi-core machines rather than VM > slices. > > > > >>>>>>> > > > > >>>>>>> Is the idea that each user would only check out the modules > > that > > > > >> he > > > > >>>> or > > > > >>>>>> she > > > > >>>>>>> is developing with? For example, if a developer is not > working > > on > > > > >>>>>>> flink-mesos or flink-yarn then the "flink-deploy" module > would > > > > >> not > > > > >>> be > > > > >>>>>> clone > > > > >>>>>>> to their filesystem? > > > > >>>>>>> > > > > >>>>>>> We can run a TravisCI nightly build on each repo to validate > > > > >>> against > > > > >>>>> API > > > > >>>>>>> changes. > > > > >>>>>>> > > > > >>>>>>> Greg > > > > >>>>>>> > > > > >>>>>>> On Wed, Feb 22, 2017 at 12:24 PM, Fabian Hueske < > > > > >> fhue...@gmail.com > > > > >>>> > > > > >>>>>> wrote: > > > > >>>>>>> > > > > >>>>>>>> Hi everybody, > > > > >>>>>>>> > > > > >>>>>>>> I think this should be a discussion about the benefits and > > > > >>>> drawbacks > > > > >>>>> of > > > > >>>>>>>> separating the code into distinct repositories from a > > > > >> development > > > > >>>>> point > > > > >>>>>>> of > > > > >>>>>>>> view. > > > > >>>>>>>> So I agree with Stephan that we should not divide the > > community > > > > >>> by > > > > >>>>>>> creating > > > > >>>>>>>> separate groups of committers. > > > > >>>>>>>> Also the discussion about independent releases is not be > > > > >> strictly > > > > >>>>>> related > > > > >>>>>>>> to the decision, IMO. > > > > >>>>>>>> > > > > >>>>>>>> I see a few pros and cons for splitting the code base into > > > > >>> separate > > > > >>>>>>>> repositories which (I think) haven't been mentioned before: > > > > >>>>>>>> pros: > > > > >>>>>>>> - IDE setup will be leaner. It is not necessary to compile > the > > > > >>>> whole > > > > >>>>>> code > > > > >>>>>>>> base to run a test after switching a branch. > > > > >>>>>>>> cons: > > > > >>>>>>>> - developing libraries features that require changes in the > > > > >> core > > > > >>> / > > > > >>>>> APIs > > > > >>>>>>>> become more time consuming due to back-and-forth between > code > > > > >>>> bases. > > > > >>>>>>>> However, I think this is not very often the case. > > > > >>>>>>>> > > > > >>>>>>>> Aljoscha has good points as well. Many of the build issues > > > > >> could > > > > >>> be > > > > >>>>>>> solved > > > > >>>>>>>> by different build profiles and configurations. > > > > >>>>>>>> > > > > >>>>>>>> Best, Fabian > > > > >>>>>>>> > > > > >>>>>>>> 2017-02-22 14:59 GMT+01:00 Gábor Hermann < > > > > >> m...@gaborhermann.com > > > > >>>> : > > > > >>>>>>>> > > > > >>>>>>>>> @Stephan: > > > > >>>>>>>>> > > > > >>>>>>>>> Although I tried to raise some issues about splitting > > > > >>> committers, > > > > >>>>> I'm > > > > >>>>>>>>> still strongly in favor of some kind of restructuring. We > > > > >> just > > > > >>>> have > > > > >>>>>> to > > > > >>>>>>> be > > > > >>>>>>>>> conscious about the disadvantages. > > > > >>>>>>>>> > > > > >>>>>>>>> Not splitting the committers could leave the libraries in > the > > > > >>>> same > > > > >>>>>>>>> stalling status, described by Till. Of course, dedicating > > > > >>> current > > > > >>>>>>>>> committers as shepherds of the libraries could easily > resolve > > > > >>> the > > > > >>>>>>> issue. > > > > >>>>>>>>> But that requires time from current committers. It seems > like > > > > >>>>>>> trade-offs > > > > >>>>>>>>> between code quality, speed of development, and committer > > > > >>>> efforts. > > > > >>>>>>>>> > > > > >>>>>>>>> From what I see in the discussion about ML, there are many > > > > >>> people > > > > >>>>>>> willing > > > > >>>>>>>>> to contribute as well as production use-cases. This means > we > > > > >>>> could > > > > >>>>>> and > > > > >>>>>>>>> should move forward. However, the development speed is > > > > >>>>> significantly > > > > >>>>>>>> slowed > > > > >>>>>>>>> down by stalling PRs. The proposal for contributors helping > > > > >> the > > > > >>>>>> review > > > > >>>>>>>>> process did not really work out so far. In my opinion, > either > > > > >>>> code > > > > >>>>>>>> quality > > > > >>>>>>>>> (by more easily accepting new committers) or some committer > > > > >>> time > > > > >>>>>>>>> (reviewing/merging) should be sacrificed to move forward. > As > > > > >>> Till > > > > >>>>> has > > > > >>>>>>>>> indicated, it would be shameful if we let this contribution > > > > >>>> effort > > > > >>>>>> die. > > > > >>>>>>>>> > > > > >>>>>>>>> Cheers, > > > > >>>>>>>>> Gabor > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>>> > > > > >>> > > > > >> > > > > > > > > > > > > > >