Good news :) A few weeks ago, I got an email from travis asking for feedback. I filled out the form and said, that the 50 minutes build time limit is a showstopper for us. And now, a few weeks later they got back to me and told me that they have increased the build time for "apache/flink" to 120 minutes. Also, we can set the settings to use the "sudo enabled infrastructure", with 7.5 Gb of main memory guaranteed.
I'll do a push to a separate branch to see how well it works :) On Tue, Mar 28, 2017 at 4:36 PM, Robert Metzger <rmetz...@apache.org> wrote: > I think your selection of modules is okay. > Moving out storm and the scala shell would be nice as well. But storm is > not really maintained, so maybe we should consider moving it out of the > Flink repo entirely. > And the scala shell is not a library, but it also doesn't really belong > into the main repo. > > Regarding the feature freeze: We either do it with a lot of time in > advance to avoid any delays for the release, OR we do it right after the > release branch has been forked off. > > > > On Tue, Mar 21, 2017 at 1:09 PM, Timo Walther <twal...@apache.org> wrote: > >> So what do we want to move to the libraries repository? >> >> I would propose to move these modules first: >> >> flink-cep-scala >> flink-cep >> flink-gelly-examples >> flink-gelly-scala >> flink-gelly >> flink-ml >> >> All other modules (e.g. in flink-contrib) are rather connectors. I think >> it would be better to move those in a connectors repository later. >> >> If we are not in a rush, we could do the moving after the feature-freeze. >> This is the time where most of the PR will have been merged. >> >> Timo >> >> >> Am 20/03/17 um 15:00 schrieb Greg Hogan: >> >>> We can add cluster tests using the distribution jar, and will need to do >>> so to remove Flink’s dependency on Hadoop. The YARN and Mesos tests would >>> still run nightly and running cluster tests should be much faster. As >>> troublesome as TravisCI has been, a major driver for this change has been >>> local build time. >>> >>> I agree with splitting off one repo at a time, but we’ll first need to >>> reorganize the core repo if using git submodules as flink-python and >>> flink-table would need to first be moved. So I think planning this out >>> first is a healthy idea, with the understanding that the plan will be >>> reevaluated. >>> >>> Any changes to the project structure need a scheduled period, perhaps a >>> week, for existing pull requests to be reviewed and accepted or closed and >>> later migrated. >>> >>> >>> On Mar 20, 2017, at 6:27 AM, Stephan Ewen <se...@apache.org> wrote: >>>> >>>> @Greg >>>> >>>> I am personally in favor of splitting "connectors" and "contrib" out as >>>> well. I know that @rmetzger has some reservations about the connectors, >>>> but >>>> we may be able to convince him. >>>> >>>> For the cluster tests (yarn / mesos) - in the past there were many cases >>>> where these tests caught cases that other tests did not, because they >>>> are >>>> the only tests that actually use the "flink-dist.jar" and thus discover >>>> many dependency and configuration issues. For that reason, my feeling >>>> would >>>> be that they are valuable in the core repository. >>>> >>>> I would actually suggest to do only the library split initially, to see >>>> what the challenges are in setting up the multi-repo build and release >>>> tooling. Once we gathered experience there, we can probably easily see >>>> what >>>> else we can split out. >>>> >>>> Stephan >>>> >>>> >>>> On Fri, Mar 17, 2017 at 8:37 PM, Greg Hogan <c...@greghogan.com> wrote: >>>> >>>> I’d like to use this refactoring opportunity to unspilt the Travis >>>>> tests. >>>>> With 51 builds queued up for the weekend (some of which may fail or >>>>> have >>>>> been force pushed) we are at the limit of the number of contributions >>>>> we >>>>> can process. Fixing this requires 1) splitting the project, 2) >>>>> investigating speedups for long-running tests, and 3) staying >>>>> cognizant of >>>>> test performance when accepting new code. >>>>> >>>>> I’d like to add one to Stephan’s list of module group. I like that the >>>>> modules are generic (“libraries”) so that no one module is alone and >>>>> independent. >>>>> >>>>> Flink has three “libraries”: cep, ml, and gelly. >>>>> >>>>> “connectors” is a hotspot due to the long-running Kafka tests (and >>>>> connectors for three Kafka versions). >>>>> >>>>> Both flink-storm and flink-python have a modest number of number of >>>>> tests >>>>> and could live with the miscellaneous modules in “contrib”. >>>>> >>>>> The YARN tests are long-running and problematic (I am unable to >>>>> successfully run these locally). A “cluster” module could host >>>>> flink-mesos, >>>>> flink-yarn, and flink-yarn-tests. >>>>> >>>>> That gets us close to running all tests in a single Travis build. >>>>> https://travis-ci.org/greghogan/flink/builds/212122590 < >>>>> https://travis-ci.org/greghogan/flink/builds/212122590> >>>>> >>>>> I also tested (https://github.com/greghogan/flink/commits/core_build < >>>>> https://github.com/greghogan/flink/commits/core_build>) with a maven >>>>> parallelism of 2 and 4, with the latter a 6.4% drop in build time. >>>>> https://travis-ci.org/greghogan/flink/builds/212137659 < >>>>> https://travis-ci.org/greghogan/flink/builds/212137659> >>>>> https://travis-ci.org/greghogan/flink/builds/212154470 < >>>>> https://travis-ci.org/greghogan/flink/builds/212154470> >>>>> >>>>> We can run Travis CI builds nightly to guard against breaking changes. >>>>> >>>>> I also wanted to get an idea of how disruptive it would be to >>>>> developers >>>>> to divide the project into multiple git repos. I wrote a simple python >>>>> script and configured it with the module partitions listed above. The >>>>> usage >>>>> string from the top of the file lists commits with files from multiple >>>>> partitions and well as the modified files. >>>>> https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897 < >>>>> https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897> >>>>> >>>>> Accounting for the merging of the batch and streaming connector >>>>> modules, >>>>> and assuming that the project structure has not changed much over the >>>>> past >>>>> 15 months, for the following date ranges the listed number of commits >>>>> would >>>>> have been split across repositories. >>>>> >>>>> since "2017-01-01" >>>>> 36 of 571 commits were mixed >>>>> >>>>> since "2016-07-01" >>>>> 155 of 1607 commits were mixed >>>>> >>>>> since "2016-01-01" >>>>> 272 of 2561 commits were mixed >>>>> >>>>> Greg >>>>> >>>>> >>>>> On Mar 15, 2017, at 1:13 PM, Stephan Ewen <se...@apache.org> wrote: >>>>>> >>>>>> @Robert - I think once we know that a separate git repo works well, >>>>>> and >>>>>> that it actually solves problems, I see no reason to not create a >>>>>> connectors repository later. The infrastructure changes should be >>>>>> >>>>> identical >>>>> >>>>>> for two or more repositories. >>>>>> >>>>>> On Wed, Mar 15, 2017 at 5:22 PM, Till Rohrmann <trohrm...@apache.org> >>>>>> >>>>> wrote: >>>>> >>>>>> I think it should not be at least the flink-dist but exactly the >>>>>>> >>>>>> remaining >>>>> >>>>>> flink-dist module. Otherwise we do redundant work. >>>>>>> >>>>>>> On Wed, Mar 15, 2017 at 5:03 PM, Robert Metzger <rmetz...@apache.org >>>>>>> > >>>>>>> wrote: >>>>>>> >>>>>>> "flink-core" means the main repository, not the "flink-core" module. >>>>>>>> >>>>>>>> When doing a release, we need to build the flink main code first, >>>>>>>> >>>>>>> because >>>>> >>>>>> the flink-libraries depend on that. >>>>>>>> Once the "flink-libraries" are build, we need to run the main build >>>>>>>> >>>>>>> again >>>>> >>>>>> (at least the flink-dist module), so that it is pulling the artifacts >>>>>>>> >>>>>>> from >>>>>>> >>>>>>>> the flink-libraries to put them into the opt/ folder of the final >>>>>>>> >>>>>>> artifact. >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Mar 15, 2017 at 4:44 PM, Till Rohrmann < >>>>>>>> trohrm...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>> I'm ok with point 3. >>>>>>>>> >>>>>>>>> Concerning point 8: Why do we have to build flink-core twice after >>>>>>>>> >>>>>>>> having >>>>>>> >>>>>>>> it built as a dependency for flink-libraries? This seems wrong to >>>>>>>>> me. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Till >>>>>>>>> >>>>>>>>> On Wed, Mar 15, 2017 at 4:23 PM, Robert Metzger < >>>>>>>>> rmetz...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Thank you. Running on AWS is a good idea! >>>>>>>>>> Let me know if you (or anybody else) wants to help me with the >>>>>>>>>> infrastructure work! Any help is much appreciated (as I've said >>>>>>>>>> >>>>>>>>> before, I >>>>>>>> >>>>>>>>> don't really have time for doing this, but it has to be done :) ) >>>>>>>>>> >>>>>>>>>> I'm against creating two new repositories. I fear that this >>>>>>>>>> >>>>>>>>> introduces >>>>>>> >>>>>>>> too >>>>>>>>> >>>>>>>>>> much complexity and too many repositories. >>>>>>>>>> "flink" and "flink-libraries" are hopefully enough to get the >>>>>>>>>> build >>>>>>>>>> >>>>>>>>> time >>>>>>>> >>>>>>>>> significantly down. >>>>>>>>>> We can also consider putting the connectors into the >>>>>>>>>> >>>>>>>>> "flink-libraries" >>>>>>> >>>>>>>> repo >>>>>>>>> >>>>>>>>>> if we need to further reduce the build time. >>>>>>>>>> >>>>>>>>>> We should probably move "flink-table" of out "flink-libraries" if >>>>>>>>>> we >>>>>>>>>> >>>>>>>>> want >>>>>>>> >>>>>>>>> to keep "flink-table" in the main repo. (This would eliminate the >>>>>>>>>> "flink-libraries" module from main. >>>>>>>>>> >>>>>>>>>> Also, I agree that "flink-statebackend-rocksdb" is not correctly >>>>>>>>>> >>>>>>>>> placed >>>>>>> >>>>>>>> in >>>>>>>>> >>>>>>>>>> contrib anymore. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Mar 15, 2017 at 4:07 PM, Greg Hogan <c...@greghogan.com> >>>>>>>>>> >>>>>>>>> wrote: >>>>>>>> >>>>>>>>> Robert, appreciate your kickstarting this task. >>>>>>>>>>> >>>>>>>>>>> We should compare the verification time with and without the >>>>>>>>>>> listed >>>>>>>>>>> modules. I’ll try to run this by tomorrow on AWS and on Travis. >>>>>>>>>>> >>>>>>>>>>> Should we maintain separate repos for flink-contrib and >>>>>>>>>>> >>>>>>>>>> flink-libraries? >>>>>>>>> >>>>>>>>>> Are you intending that we move flink-table out of flink-libraries >>>>>>>>>>> >>>>>>>>>> (and >>>>>>>> >>>>>>>>> perhaps flink-statebackend-rocksdb out of flink-contrib)? >>>>>>>>>>> >>>>>>>>>>> Greg >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mar 15, 2017, at 9:55 AM, Robert Metzger <rmetz...@apache.org >>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thank you for looking into this Till. >>>>>>>>>>>> >>>>>>>>>>>> I think we then have to split the repositories. >>>>>>>>>>>> My main motivation for doing this is that it seems to be the >>>>>>>>>>>> only >>>>>>>>>>>> >>>>>>>>>>> feasible >>>>>>>>>>> >>>>>>>>>>>> way of scaling the community to allow more committers working on >>>>>>>>>>>> >>>>>>>>>>> the >>>>>>>> >>>>>>>>> libraries. >>>>>>>>>>>> >>>>>>>>>>>> I'll take care of getting things started. >>>>>>>>>>>> >>>>>>>>>>>> As the next steps I propose to: >>>>>>>>>>>> 1. Ask INFRA to rename https://git-wip-us.apache.org/ >>>>>>>>>>>> >>>>>>>>>>> repos/asf?p=flink- >>>>>>>>>> >>>>>>>>>>> connectors.git;a=summary to "flink-libraries" >>>>>>>>>>>> 2. Ask INFRA to set up GitHub and travis integration for >>>>>>>>>>>> >>>>>>>>>>> "flink-libraries" >>>>>>>>>>> >>>>>>>>>>>> 3. Put the code of "flink-ml", "flink-gelly", "flink-python", >>>>>>>>>>>> >>>>>>>>>>> "flink-cep", >>>>>>>>>>> >>>>>>>>>>>> "flink-scala-shell", "flink-storm" into the new repository. (I >>>>>>>>>>>> >>>>>>>>>>> decided >>>>>>>>> >>>>>>>>>> against moving flink-contrib there, because rocksdb is in the >>>>>>>>>>>> >>>>>>>>>>> contrib >>>>>>>> >>>>>>>>> module, for flink-table, I'm undecided, but I kept it in the main >>>>>>>>>>>> >>>>>>>>>>> repo >>>>>>>>> >>>>>>>>>> because its probably going to interact more with the core code in >>>>>>>>>>>> >>>>>>>>>>> the >>>>>>>> >>>>>>>>> future) >>>>>>>>>>>> I try to preserve the history of those modules when splitting >>>>>>>>>>>> >>>>>>>>>>> them >>>>>>> >>>>>>>> into >>>>>>>>> >>>>>>>>>> the >>>>>>>>>>> >>>>>>>>>>>> new repo >>>>>>>>>>>> 4. I'll close all pull requests against those modules in the >>>>>>>>>>>> main >>>>>>>>>>>> >>>>>>>>>>> repo. >>>>>>>>> >>>>>>>>>> 5. I'll set up a minimal documentation page for the library >>>>>>>>>>>> >>>>>>>>>>> repository, >>>>>>>>> >>>>>>>>>> similar to the main documentation. >>>>>>>>>>>> 6. I'll update the documentation build process to build both >>>>>>>>>>>> >>>>>>>>>>> documentations >>>>>>>>>>> >>>>>>>>>>>> & link them to each other >>>>>>>>>>>> 7. I'll update the nightly deployment process to include both >>>>>>>>>>>> >>>>>>>>>>> repositories >>>>>>>>>>> >>>>>>>>>>>> 8. I'll update the release script to create the Flink release >>>>>>>>>>>> out >>>>>>>>>>>> >>>>>>>>>>> of >>>>>>>> >>>>>>>>> both >>>>>>>>>> >>>>>>>>>>> repositories. In order to put the libraries into the opt/ dir of >>>>>>>>>>>> >>>>>>>>>>> the >>>>>>>> >>>>>>>>> release, I'll need to change the build of "flink-dist" so that it >>>>>>>>>>>> >>>>>>>>>>> first >>>>>>>>> >>>>>>>>>> builds flink core, then the libraries and then the core again >>>>>>>>>>>> >>>>>>>>>>> with >>>>>>> >>>>>>>> the >>>>>>>>> >>>>>>>>>> libraries as an additional dependency. >>>>>>>>>>>> >>>>>>>>>>>> The main question for the community is: do you agree with point >>>>>>>>>>>> >>>>>>>>>>> 3 ? >>>>>>> >>>>>>>> Would >>>>>>>>>> >>>>>>>>>>> you like to include more or less? >>>>>>>>>>>> >>>>>>>>>>>> I'll start with 1. and 2. tomorrow morning. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Mar 15, 2017 at 1:48 PM, Till Rohrmann < >>>>>>>>>>>> >>>>>>>>>>> trohrm...@apache.org >>>>>>>> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> In theory we could have a merging bot which solves the problem >>>>>>>>>>>>> >>>>>>>>>>>> of >>>>>>> >>>>>>>> the >>>>>>>>> >>>>>>>>>> "commit window". Once the PR passes all tests and has enough >>>>>>>>>>>>> >>>>>>>>>>>> +1s, >>>>>>> >>>>>>>> the >>>>>>>>> >>>>>>>>>> bot >>>>>>>>>>> >>>>>>>>>>>> could do the merging and, thus, it effectively linearizes the >>>>>>>>>>>>> >>>>>>>>>>>> merge >>>>>>>> >>>>>>>>> process. >>>>>>>>>>>>> >>>>>>>>>>>>> I think the second point is actually a disadvantage because >>>>>>>>>>>>> >>>>>>>>>>>> there >>>>>>> >>>>>>>> is >>>>>>>> >>>>>>>>> not >>>>>>>>>> >>>>>>>>>>> such an immediate incentive/pressure to fix the broken module if >>>>>>>>>>>>> >>>>>>>>>>>> it >>>>>>>> >>>>>>>>> lives >>>>>>>>>>> >>>>>>>>>>>> in a separate repository. Furthermore, breaking API changes in >>>>>>>>>>>>> >>>>>>>>>>>> the >>>>>>> >>>>>>>> core >>>>>>>>>> >>>>>>>>>>> will most likely go unnoticed for some time in other modules >>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>> >>>>>>>> are >>>>>>>>> >>>>>>>>>> not >>>>>>>>>>> >>>>>>>>>>>> developed so actively. In the worst case these things will only >>>>>>>>>>>>> >>>>>>>>>>>> be >>>>>>> >>>>>>>> noticed >>>>>>>>>>> >>>>>>>>>>>> when we try to make a release. >>>>>>>>>>>>> >>>>>>>>>>>>> But I also agree that we are not Google and we don't have the >>>>>>>>>>>>> >>>>>>>>>>>> capacities to >>>>>>>>>>> >>>>>>>>>>>> maintain such a smooth a build process that we can keep all the >>>>>>>>>>>>> >>>>>>>>>>>> code >>>>>>>> >>>>>>>>> in >>>>>>>>>> >>>>>>>>>>> a >>>>>>>>>>> >>>>>>>>>>>> single repository. >>>>>>>>>>>>> >>>>>>>>>>>>> I looked a bit into Gradle and as far as I can tell it offers >>>>>>>>>>>>> >>>>>>>>>>>> some >>>>>>> >>>>>>>> nice >>>>>>>>>> >>>>>>>>>>> features wrt incrementally building projects. This would be >>>>>>>>>>>>> >>>>>>>>>>>> beneficial >>>>>>>>> >>>>>>>>>> for >>>>>>>>>>> >>>>>>>>>>>> local development but it would not solve our build time problems >>>>>>>>>>>>> >>>>>>>>>>>> on >>>>>>>> >>>>>>>>> Travis. >>>>>>>>>>> >>>>>>>>>>>> Gradle intends to introduce a task result cache which allows to >>>>>>>>>>>>> >>>>>>>>>>>> reuse >>>>>>>>> >>>>>>>>>> results across builds. This could help when building on Travis, >>>>>>>>>>>>> >>>>>>>>>>>> however, it >>>>>>>>>>> >>>>>>>>>>>> is not yet fully implemented. Moreover, migrating from Maven to >>>>>>>>>>>>> >>>>>>>>>>>> Gradle >>>>>>>>> >>>>>>>>>> won't come for free (there's simply no free lunch out there) and >>>>>>>>>>>>> >>>>>>>>>>>> we >>>>>>>> >>>>>>>>> might >>>>>>>>>>> >>>>>>>>>>>> risk to introduce new bugs. Therefore, I would vote to split the >>>>>>>>>>>>> >>>>>>>>>>>> repository >>>>>>>>>>> >>>>>>>>>>>> in order to mitigate our current problems with Travis and the >>>>>>>>>>>>> >>>>>>>>>>>> build >>>>>>>> >>>>>>>>> time in >>>>>>>>>>> >>>>>>>>>>>> general. Whether to use a different build system or not can then >>>>>>>>>>>>> >>>>>>>>>>>> be >>>>>>>> >>>>>>>>> discussed as an orthogonal question. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Till >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Mar 14, 2017 at 8:05 PM, Stephan Ewen < >>>>>>>>>>>>> se...@apache.org >>>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Some other thoughts on how repository split would help. I am >>>>>>>>>>>>>> >>>>>>>>>>>>> not >>>>>>> >>>>>>>> sure >>>>>>>>> >>>>>>>>>> for >>>>>>>>>>> >>>>>>>>>>>> all of them, so please comment: >>>>>>>>>>>>>> >>>>>>>>>>>>>> - There is less competition for a "commit window". It happens >>>>>>>>>>>>>> >>>>>>>>>>>>> a >>>>>>> >>>>>>>> lot >>>>>>>>> >>>>>>>>>> already that you run all tests and want to commit, but there >>>>>>>>>>>>>> >>>>>>>>>>>>> was >>>>>>> >>>>>>>> a >>>>>>>> >>>>>>>>> commit >>>>>>>>>>> >>>>>>>>>>>> in the meantime. You rebase, need to re-test, again commit in >>>>>>>>>>>>>> >>>>>>>>>>>>> the >>>>>>> >>>>>>>> meantime. >>>>>>>>>>>>> >>>>>>>>>>>>>> For a "linear" commit history, this may become a bottleneck >>>>>>>>>>>>>> >>>>>>>>>>>>> eventually >>>>>>>>>>>>> >>>>>>>>>>>>>> as well. >>>>>>>>>>>>>> >>>>>>>>>>>>>> - There is less risk of broken master. If one >>>>>>>>>>>>>> >>>>>>>>>>>>> repository/modules >>>>>>> >>>>>>>> breaks >>>>>>>>>>> >>>>>>>>>>>> its master, the others can still continue. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Stephan >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Mar 10, 2017 at 12:20 PM, Till Rohrmann < >>>>>>>>>>>>>> >>>>>>>>>>>>> trohrm...@apache.org> >>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for all your input. In order to wrap the discussion up >>>>>>>>>>>>>>> >>>>>>>>>>>>>> I'd >>>>>>>> >>>>>>>>> like >>>>>>>>>> >>>>>>>>>>> to >>>>>>>>>>>>> >>>>>>>>>>>>>> summarize the mentioned points: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The problem of increasing build times and complexity of the >>>>>>>>>>>>>>> >>>>>>>>>>>>>> project >>>>>>>>> >>>>>>>>>> has >>>>>>>>>>> >>>>>>>>>>>> been acknowledged. Ideally we would have everything in one >>>>>>>>>>>>>>> >>>>>>>>>>>>>> repository >>>>>>>>>> >>>>>>>>>>> using >>>>>>>>>>>>>> >>>>>>>>>>>>>>> an incremental build tool. Since Maven does not properly >>>>>>>>>>>>>>> >>>>>>>>>>>>>> support >>>>>>> >>>>>>>> this >>>>>>>>>> >>>>>>>>>>> we >>>>>>>>>>>>> >>>>>>>>>>>>>> would have to switch our build tool to something like Gradle, >>>>>>>>>>>>>>> >>>>>>>>>>>>>> for >>>>>>>> >>>>>>>>> example. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Another option is introducing build profiles for different >>>>>>>>>>>>>>> >>>>>>>>>>>>>> sets >>>>>>> >>>>>>>> of >>>>>>>> >>>>>>>>> modules >>>>>>>>>>>>>> >>>>>>>>>>>>>>> as well as separating integration and unit tests. The third >>>>>>>>>>>>>>> >>>>>>>>>>>>>> alternative >>>>>>>>>>> >>>>>>>>>>>> would be creating sub-projects with their own repositories. I >>>>>>>>>>>>>>> >>>>>>>>>>>>>> actually >>>>>>>>>> >>>>>>>>>>> think that these two proposal are not necessarily exclusive >>>>>>>>>>>>>>> >>>>>>>>>>>>>> and >>>>>>> >>>>>>>> it >>>>>>>> >>>>>>>>> would >>>>>>>>>>>>> >>>>>>>>>>>>>> also make sense to have a separation between unit and >>>>>>>>>>>>>>> >>>>>>>>>>>>>> integration >>>>>>>> >>>>>>>>> tests >>>>>>>>>>> >>>>>>>>>>>> if >>>>>>>>>>>>>> >>>>>>>>>>>>>>> we split the respository. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The overall consensus seems to be that we don't want to split >>>>>>>>>>>>>>> >>>>>>>>>>>>>> the >>>>>>>> >>>>>>>>> community >>>>>>>>>>>>>> >>>>>>>>>>>>>>> and want to keep everything under the same umbrella. I think >>>>>>>>>>>>>>> >>>>>>>>>>>>>> this >>>>>>>> >>>>>>>>> is >>>>>>>>> >>>>>>>>>> the >>>>>>>>>>>>> >>>>>>>>>>>>>> right way to go, because otherwise some parts of the project >>>>>>>>>>>>>>> >>>>>>>>>>>>>> could >>>>>>>> >>>>>>>>> become >>>>>>>>>>>>> >>>>>>>>>>>>>> second class citizens. Given that and that we continue using >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Maven, >>>>>>>>> >>>>>>>>>> I >>>>>>>>>> >>>>>>>>>>> still >>>>>>>>>>>>>> >>>>>>>>>>>>>>> think that creating sub-projects for the libraries, for >>>>>>>>>>>>>>> >>>>>>>>>>>>>> example, >>>>>>> >>>>>>>> could >>>>>>>>>> >>>>>>>>>>> be >>>>>>>>>>>>> >>>>>>>>>>>>>> beneficial. A split could reduce the project's complexity and >>>>>>>>>>>>>>> >>>>>>>>>>>>>> make >>>>>>>> >>>>>>>>> it >>>>>>>>>> >>>>>>>>>>> potentially easier for libraries to get actively developed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> The >>>>>>> >>>>>>>> main >>>>>>>>> >>>>>>>>>> concern is setting up the build infrastructure to aggregate >>>>>>>>>>>>>>> >>>>>>>>>>>>>> docs >>>>>>> >>>>>>>> from >>>>>>>>>> >>>>>>>>>>> multiple repositories and making them publicly available. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Since I started this thread and I would really like to see >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Flink's >>>>>>>> >>>>>>>>> ML >>>>>>>>>> >>>>>>>>>>> library being revived again, I'd volunteer investigating first >>>>>>>>>>>>>>> >>>>>>>>>>>>>> whether >>>>>>>>>> >>>>>>>>>>> it >>>>>>>>>>>>> >>>>>>>>>>>>>> is doable establishing a proper incremental build for Flink. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> If >>>>>>> >>>>>>>> that >>>>>>>>> >>>>>>>>>> should >>>>>>>>>>>>>> >>>>>>>>>>>>>>> not be possible, I will look into splitting the repository, >>>>>>>>>>>>>>> >>>>>>>>>>>>>> first >>>>>>>> >>>>>>>>> only >>>>>>>>>> >>>>>>>>>>> for >>>>>>>>>>>>>> >>>>>>>>>>>>>>> the libraries. I'll share my results with the community once >>>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm >>>>>>> >>>>>>>> done >>>>>>>>>> >>>>>>>>>>> with >>>>>>>>>>>>>> >>>>>>>>>>>>>>> the investigation. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> Till >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Feb 24, 2017 at 3:50 PM, Robert Metzger < >>>>>>>>>>>>>>> >>>>>>>>>>>>>> rmetz...@apache.org> >>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> @Jin Mingjian: You can not use the paid travis version for >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> open >>>>>>> >>>>>>>> source >>>>>>>>>>>>> >>>>>>>>>>>>>> projects. It only works for private repositories (at least >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> back >>>>>>> >>>>>>>> then >>>>>>>>>> >>>>>>>>>>> when >>>>>>>>>>>>>> >>>>>>>>>>>>>>> we've asked them about that). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> @Stephan: I don't think that incremental builds will be >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> available >>>>>>>> >>>>>>>>> with >>>>>>>>>>>>> >>>>>>>>>>>>>> Maven anytime soon. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I agree that we need to fix the build time issue on Travis. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I've >>>>>>>> >>>>>>>>> recently >>>>>>>>>>>>>> >>>>>>>>>>>>>>> pushed a commit to use now three instead of two test groups. >>>>>>>>>>>>>>>> But I don't think that this is feasible long-term solution. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If this discussion is only about reducing the build and test >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> time, >>>>>>>>> >>>>>>>>>> introducing build profiles for different components as >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Aljoscha >>>>>>> >>>>>>>> suggested >>>>>>>>>>>>>> >>>>>>>>>>>>>>> would solve the problem Till mentioned. >>>>>>>>>>>>>>>> Also, if we decide that travis is not a good tool anymore >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> the >>>>>>>> >>>>>>>>> testing, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I guess we can find a different solution. There are now >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> competitors >>>>>>>>> >>>>>>>>>> to >>>>>>>>>>>>> >>>>>>>>>>>>>> Travis that might be willing to offer a paid plan for an open >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> source >>>>>>>>>> >>>>>>>>>>> project, or we set up our own infra on a server sponsored by >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> one >>>>>>>> >>>>>>>>> of >>>>>>>>> >>>>>>>>>> the >>>>>>>>>>>>> >>>>>>>>>>>>>> contributing companies. >>>>>>>>>>>>>>>> If we want to solve "community issues" with the change as >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> well, >>>>>>> >>>>>>>> then >>>>>>>>>> >>>>>>>>>>> I >>>>>>>>>>>>> >>>>>>>>>>>>>> think its work the effort of splitting up Flink into >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> different >>>>>>> >>>>>>>> repositories. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Splitting up repositories is not a trivial task in my >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> opinion. >>>>>>> >>>>>>>> As >>>>>>>> >>>>>>>>> others >>>>>>>>>>>>>> >>>>>>>>>>>>>>> have mentioned before, we need to consider the following >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> things: >>>>>>>> >>>>>>>>> - How are we doing to build the documentation? Ideally every >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> repo >>>>>>>> >>>>>>>>> should >>>>>>>>>>>>>> >>>>>>>>>>>>>>> contain its docs, so we would need to pull them together when >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> building >>>>>>>>>>>>> >>>>>>>>>>>>>> the >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> main docs. >>>>>>>>>>>>>>>> - How do organize the dependencies? If we have library >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> repository >>>>>>>> >>>>>>>>> depend >>>>>>>>>>>>>> >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> snapshot Flink versions, we need to make sure that the >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> snapshot >>>>>>> >>>>>>>> deployment >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> always works. This also means that people working on a >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> library >>>>>>> >>>>>>>> repository >>>>>>>>>>>>>> >>>>>>>>>>>>>>> will pull from snapshot OR need to build first locally. >>>>>>>>>>>>>>>> - We need to update the release scripts >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If we commit to do these changes, we need to assign at least >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> one >>>>>>>> >>>>>>>>> committer >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (yes, in this case we need somebody who can commit, for >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> example >>>>>>> >>>>>>>> for >>>>>>>>> >>>>>>>>>> updating the buildbot stuff) who volunteers to do the change. >>>>>>>>>>>>>>>> I've done a lot of infrastructure work in the past, but I'm >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> currently >>>>>>>>>> >>>>>>>>>>> pretty booked with many other things, so I don't >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> realistically >>>>>>> >>>>>>>> see >>>>>>>>> >>>>>>>>>> myself >>>>>>>>>>>>>> >>>>>>>>>>>>>>> doing that. Max who used to work on these things is taking >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> some >>>>>>> >>>>>>>> time >>>>>>>>>> >>>>>>>>>>> off. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think we need, best case 3 days for the change, worst case >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 5 >>>>>>> >>>>>>>> days. >>>>>>>>>> >>>>>>>>>>> The >>>>>>>>>>>>>> >>>>>>>>>>>>>>> problem is that there are no "unit tests" for the infra >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> stuff, >>>>>>> >>>>>>>> so >>>>>>>> >>>>>>>>> many >>>>>>>>>>>>> >>>>>>>>>>>>>> things are "trial and error" (like Apache's buildbot, our >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> release >>>>>>>> >>>>>>>>> scripts, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> the doc scripts, maven stuff, nightly builds). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Feb 23, 2017 at 1:33 PM, Stephan Ewen < >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> se...@apache.org> >>>>>>>> >>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> If we can get a incremental builds to work, that would >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> actually >>>>>>>> >>>>>>>>> be >>>>>>>>> >>>>>>>>>> the >>>>>>>>>>>>>> >>>>>>>>>>>>>>> preferred solution in my opinion. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Many companies have invested heavily in making a "single >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> repository" >>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> base work, because it has the advantage of not having to >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> update/publish >>>>>>>>>>>>>> >>>>>>>>>>>>>>> several repositories first. >>>>>>>>>>>>>>>>> However, the strong prerequisite for that is an incremental >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> build >>>>>>>>> >>>>>>>>>> system >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> that builds only (fine grained) what it has to build. I am >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> not >>>>>>> >>>>>>>> sure >>>>>>>>>> >>>>>>>>>>> how >>>>>>>>>>>>>> >>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> could make that work >>>>>>>>>>>>>>>>> with Maven and Travis... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Feb 22, 2017 at 10:42 PM, Greg Hogan < >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> c...@greghogan.com> >>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> An additional option for reducing time to build and test is >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> parallel >>>>>>>>>>>>>> >>>>>>>>>>>>>>> execution. This would help users more than on TravisCI >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >