"flink-core" means the main repository, not the "flink-core" module.

When doing a release, we need to build the flink main code first, because
the flink-libraries depend on that.
Once the "flink-libraries" are build, we need to run the main build again
(at least the flink-dist module), so that it is pulling the artifacts from
the flink-libraries to put them into the opt/ folder of the final artifact.




On Wed, Mar 15, 2017 at 4:44 PM, Till Rohrmann <trohrm...@apache.org> wrote:

> I'm ok with point 3.
>
> Concerning point 8: Why do we have to build flink-core twice after having
> it built as a dependency for flink-libraries? This seems wrong to me.
>
> Cheers,
> Till
>
> On Wed, Mar 15, 2017 at 4:23 PM, Robert Metzger <rmetz...@apache.org>
> wrote:
>
> > Thank you. Running on AWS is a good idea!
> > Let me know if you (or anybody else) wants to help me with the
> > infrastructure work! Any help is much appreciated (as I've said before, I
> > don't really have time for doing this, but it has to be done :) )
> >
> > I'm against creating two new repositories. I fear that this introduces
> too
> > much complexity and too many repositories.
> > "flink" and "flink-libraries" are hopefully enough to get the build time
> > significantly down.
> > We can also consider putting the connectors into the "flink-libraries"
> repo
> > if we need to further reduce the build time.
> >
> > We should probably move "flink-table" of out "flink-libraries" if we want
> > to keep "flink-table" in the main repo. (This would eliminate the
> > "flink-libraries" module from main.
> >
> > Also, I agree that "flink-statebackend-rocksdb" is not correctly placed
> in
> > contrib anymore.
> >
> >
> > On Wed, Mar 15, 2017 at 4:07 PM, Greg Hogan <c...@greghogan.com> wrote:
> >
> > > Robert, appreciate your kickstarting this task.
> > >
> > > We should compare the verification time with and without the listed
> > > modules. I’ll try to run this by tomorrow on AWS and on Travis.
> > >
> > > Should we maintain separate repos for flink-contrib and
> flink-libraries?
> > > Are you intending that we move flink-table out of flink-libraries (and
> > > perhaps flink-statebackend-rocksdb out of flink-contrib)?
> > >
> > > Greg
> > >
> > >
> > > > On Mar 15, 2017, at 9:55 AM, Robert Metzger <rmetz...@apache.org>
> > wrote:
> > > >
> > > > Thank you for looking into this Till.
> > > >
> > > > I think we then have to split the repositories.
> > > > My main motivation for doing this is that it seems to be the only
> > > feasible
> > > > way of scaling the community to allow more committers working on the
> > > > libraries.
> > > >
> > > > I'll take care of getting things started.
> > > >
> > > > As the next steps I propose to:
> > > > 1. Ask INFRA to rename https://git-wip-us.apache.org/
> > repos/asf?p=flink-
> > > > connectors.git;a=summary to "flink-libraries"
> > > > 2. Ask INFRA to set up GitHub and travis integration for
> > > "flink-libraries"
> > > > 3. Put the code of "flink-ml", "flink-gelly", "flink-python",
> > > "flink-cep",
> > > > "flink-scala-shell", "flink-storm" into the new repository. (I
> decided
> > > > against moving flink-contrib there, because rocksdb is in the contrib
> > > > module, for flink-table, I'm undecided, but I kept it in the main
> repo
> > > > because its probably going to interact more with the core code in the
> > > > future)
> > > > I try to preserve the history of those modules when splitting them
> into
> > > the
> > > > new repo
> > > > 4. I'll close all pull requests against those modules in the main
> repo.
> > > > 5. I'll set up a minimal documentation page for the library
> repository,
> > > > similar to the main documentation.
> > > > 6. I'll update the documentation build process to build both
> > > documentations
> > > > & link them to each other
> > > > 7. I'll update the nightly deployment process to include both
> > > repositories
> > > > 8. I'll update the release script to create the Flink release out of
> > both
> > > > repositories. In order to put the libraries into the opt/ dir of the
> > > > release, I'll need to change the build of "flink-dist" so that it
> first
> > > > builds flink core, then the libraries and then the core again with
> the
> > > > libraries as an additional dependency.
> > > >
> > > > The main question for the community is: do you agree with point 3 ?
> > Would
> > > > you like to include more or less?
> > > >
> > > > I'll start with 1. and 2. tomorrow morning.
> > > >
> > > >
> > > >
> > > > On Wed, Mar 15, 2017 at 1:48 PM, Till Rohrmann <trohrm...@apache.org
> >
> > > wrote:
> > > >
> > > >> In theory we could have a merging bot which solves the problem of
> the
> > > >> "commit window". Once the PR passes all tests and has enough +1s,
> the
> > > bot
> > > >> could do the merging and, thus, it effectively linearizes the merge
> > > >> process.
> > > >>
> > > >> I think the second point is actually a disadvantage because there is
> > not
> > > >> such an immediate incentive/pressure to fix the broken module if it
> > > lives
> > > >> in a separate repository. Furthermore, breaking API changes in the
> > core
> > > >> will most likely go unnoticed for some time in other modules which
> are
> > > not
> > > >> developed so actively. In the worst case these things will only be
> > > noticed
> > > >> when we try to make a release.
> > > >>
> > > >> But I also agree that we are not Google and we don't have the
> > > capacities to
> > > >> maintain such a smooth a build process that we can keep all the code
> > in
> > > a
> > > >> single repository.
> > > >>
> > > >> I looked a bit into Gradle and as far as I can tell it offers some
> > nice
> > > >> features wrt incrementally building projects. This would be
> beneficial
> > > for
> > > >> local development but it would not solve our build time problems on
> > > Travis.
> > > >> Gradle intends to introduce a task result cache which allows to
> reuse
> > > >> results across builds. This could help when building on Travis,
> > > however, it
> > > >> is not yet fully implemented. Moreover, migrating from Maven to
> Gradle
> > > >> won't come for free (there's simply no free lunch out there) and we
> > > might
> > > >> risk to introduce new bugs. Therefore, I would vote to split the
> > > repository
> > > >> in order to mitigate our current problems with Travis and the build
> > > time in
> > > >> general. Whether to use a different build system or not can then be
> > > >> discussed as an orthogonal question.
> > > >>
> > > >> Cheers,
> > > >> Till
> > > >>
> > > >> On Tue, Mar 14, 2017 at 8:05 PM, Stephan Ewen <se...@apache.org>
> > wrote:
> > > >>
> > > >>> Some other thoughts on how repository split would help. I am not
> sure
> > > for
> > > >>> all of them, so please comment:
> > > >>>
> > > >>>  - There is less competition for a "commit window". It happens a
> lot
> > > >>> already that you run all tests and want to commit, but there was a
> > > commit
> > > >>> in the meantime. You rebase, need to re-test, again commit in the
> > > >> meantime.
> > > >>>    For a "linear" commit history, this may become a bottleneck
> > > >> eventually
> > > >>> as well.
> > > >>>
> > > >>>  - There is less risk of broken master. If one repository/modules
> > > breaks
> > > >>> its master, the others can still continue.
> > > >>>
> > > >>> Stephan
> > > >>>
> > > >>>
> > > >>> On Fri, Mar 10, 2017 at 12:20 PM, Till Rohrmann <
> > trohrm...@apache.org>
> > > >>> wrote:
> > > >>>
> > > >>>> Thanks for all your input. In order to wrap the discussion up I'd
> > like
> > > >> to
> > > >>>> summarize the mentioned points:
> > > >>>>
> > > >>>> The problem of increasing build times and complexity of the
> project
> > > has
> > > >>>> been acknowledged. Ideally we would have everything in one
> > repository
> > > >>> using
> > > >>>> an incremental build tool. Since Maven does not properly support
> > this
> > > >> we
> > > >>>> would have to switch our build tool to something like Gradle, for
> > > >>> example.
> > > >>>>
> > > >>>> Another option is introducing build profiles for different sets of
> > > >>> modules
> > > >>>> as well as separating integration and unit tests. The third
> > > alternative
> > > >>>> would be creating sub-projects with their own repositories. I
> > actually
> > > >>>> think that these two proposal are not necessarily exclusive and it
> > > >> would
> > > >>>> also make sense to have a separation between unit and integration
> > > tests
> > > >>> if
> > > >>>> we split the respository.
> > > >>>>
> > > >>>> The overall consensus seems to be that we don't want to split the
> > > >>> community
> > > >>>> and want to keep everything under the same umbrella. I think this
> is
> > > >> the
> > > >>>> right way to go, because otherwise some parts of the project could
> > > >> become
> > > >>>> second class citizens. Given that and that we continue using
> Maven,
> > I
> > > >>> still
> > > >>>> think that creating sub-projects for the libraries, for example,
> > could
> > > >> be
> > > >>>> beneficial. A split could reduce the project's complexity and make
> > it
> > > >>>> potentially easier for libraries to get actively developed. The
> main
> > > >>>> concern is setting up the build infrastructure to aggregate docs
> > from
> > > >>>> multiple repositories and making them publicly available.
> > > >>>>
> > > >>>> Since I started this thread and I would really like to see Flink's
> > ML
> > > >>>> library being revived again, I'd volunteer investigating first
> > whether
> > > >> it
> > > >>>> is doable establishing a proper incremental build for Flink. If
> that
> > > >>> should
> > > >>>> not be possible, I will look into splitting the repository, first
> > only
> > > >>> for
> > > >>>> the libraries. I'll share my results with the community once I'm
> > done
> > > >>> with
> > > >>>> the investigation.
> > > >>>>
> > > >>>> Cheers,
> > > >>>> Till
> > > >>>>
> > > >>>> On Fri, Feb 24, 2017 at 3:50 PM, Robert Metzger <
> > rmetz...@apache.org>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> @Jin Mingjian: You can not use the paid travis version for open
> > > >> source
> > > >>>>> projects. It only works for private repositories (at least back
> > then
> > > >>> when
> > > >>>>> we've asked them about that).
> > > >>>>>
> > > >>>>> @Stephan: I don't think that incremental builds will be available
> > > >> with
> > > >>>>> Maven anytime soon.
> > > >>>>>
> > > >>>>> I agree that we need to fix the build time issue on Travis. I've
> > > >>> recently
> > > >>>>> pushed a commit to use now three instead of two test groups.
> > > >>>>> But I don't think that this is feasible long-term solution.
> > > >>>>>
> > > >>>>> If this discussion is only about reducing the build and test
> time,
> > > >>>>> introducing build profiles for different components as Aljoscha
> > > >>> suggested
> > > >>>>> would solve the problem Till mentioned.
> > > >>>>> Also, if we decide that travis is not a good tool anymore for the
> > > >>>> testing,
> > > >>>>> I guess we can find a different solution. There are now
> competitors
> > > >> to
> > > >>>>> Travis that might be willing to offer a paid plan for an open
> > source
> > > >>>>> project, or we set up our own infra on a server sponsored by one
> of
> > > >> the
> > > >>>>> contributing companies.
> > > >>>>> If we want to solve "community issues" with the change as well,
> > then
> > > >> I
> > > >>>>> think its work the effort of splitting up Flink into different
> > > >>>>> repositories.
> > > >>>>>
> > > >>>>> Splitting up repositories is not a trivial task in my opinion. As
> > > >>> others
> > > >>>>> have mentioned before, we need to consider the following things:
> > > >>>>> - How are we doing to build the documentation? Ideally every repo
> > > >>> should
> > > >>>>> contain its docs, so we would need to pull them together when
> > > >> building
> > > >>>> the
> > > >>>>> main docs.
> > > >>>>> - How do organize the dependencies? If we have library repository
> > > >>> depend
> > > >>>> on
> > > >>>>> snapshot Flink versions, we need to make sure that the snapshot
> > > >>>> deployment
> > > >>>>> always works. This also means that people working on a library
> > > >>> repository
> > > >>>>> will pull from snapshot OR need to build first locally.
> > > >>>>> - We need to update the release scripts
> > > >>>>>
> > > >>>>> If we commit to do these changes, we need to assign at least one
> > > >>>> committer
> > > >>>>> (yes, in this case we need somebody who can commit, for example
> for
> > > >>>>> updating the buildbot stuff) who volunteers to do the change.
> > > >>>>> I've done a lot of infrastructure work in the past, but I'm
> > currently
> > > >>>>> pretty booked with many other things, so I don't realistically
> see
> > > >>> myself
> > > >>>>> doing that. Max who used to work on these things is taking some
> > time
> > > >>> off.
> > > >>>>> I think we need, best case 3 days for the change, worst case 5
> > days.
> > > >>> The
> > > >>>>> problem is that there are no "unit tests" for the infra stuff, so
> > > >> many
> > > >>>>> things are "trial and error" (like Apache's buildbot, our release
> > > >>>> scripts,
> > > >>>>> the doc scripts, maven stuff, nightly builds).
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On Thu, Feb 23, 2017 at 1:33 PM, Stephan Ewen <se...@apache.org>
> > > >>> wrote:
> > > >>>>>
> > > >>>>>> If we can get a incremental builds to work, that would actually
> be
> > > >>> the
> > > >>>>>> preferred solution in my opinion.
> > > >>>>>>
> > > >>>>>> Many companies have invested heavily in making a "single
> > > >> repository"
> > > >>>> code
> > > >>>>>> base work, because it has the advantage of not having to
> > > >>> update/publish
> > > >>>>>> several repositories first.
> > > >>>>>> However, the strong prerequisite for that is an incremental
> build
> > > >>>> system
> > > >>>>>> that builds only (fine grained) what it has to build. I am not
> > sure
> > > >>> how
> > > >>>>> we
> > > >>>>>> could make that work
> > > >>>>>> with Maven and Travis...
> > > >>>>>>
> > > >>>>>> On Wed, Feb 22, 2017 at 10:42 PM, Greg Hogan <
> c...@greghogan.com>
> > > >>>> wrote:
> > > >>>>>>
> > > >>>>>>> An additional option for reducing time to build and test is
> > > >>> parallel
> > > >>>>>>> execution. This would help users more than on TravisCI since
> > > >> we're
> > > >>>>>>> generally running on multi-core machines rather than VM slices.
> > > >>>>>>>
> > > >>>>>>> Is the idea that each user would only check out the modules
> that
> > > >> he
> > > >>>> or
> > > >>>>>> she
> > > >>>>>>> is developing with? For example, if a developer is not working
> on
> > > >>>>>>> flink-mesos or flink-yarn then the "flink-deploy" module would
> > > >> not
> > > >>> be
> > > >>>>>> clone
> > > >>>>>>> to their filesystem?
> > > >>>>>>>
> > > >>>>>>> We can run a TravisCI nightly build on each repo to validate
> > > >>> against
> > > >>>>> API
> > > >>>>>>> changes.
> > > >>>>>>>
> > > >>>>>>> Greg
> > > >>>>>>>
> > > >>>>>>> On Wed, Feb 22, 2017 at 12:24 PM, Fabian Hueske <
> > > >> fhue...@gmail.com
> > > >>>>
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hi everybody,
> > > >>>>>>>>
> > > >>>>>>>> I think this should be a discussion about the benefits and
> > > >>>> drawbacks
> > > >>>>> of
> > > >>>>>>>> separating the code into distinct repositories from a
> > > >> development
> > > >>>>> point
> > > >>>>>>> of
> > > >>>>>>>> view.
> > > >>>>>>>> So I agree with Stephan that we should not divide the
> community
> > > >>> by
> > > >>>>>>> creating
> > > >>>>>>>> separate groups of committers.
> > > >>>>>>>> Also the discussion about independent releases is not be
> > > >> strictly
> > > >>>>>> related
> > > >>>>>>>> to the decision, IMO.
> > > >>>>>>>>
> > > >>>>>>>> I see a few pros and cons for splitting the code base into
> > > >>> separate
> > > >>>>>>>> repositories which (I think) haven't been mentioned before:
> > > >>>>>>>> pros:
> > > >>>>>>>> - IDE setup will be leaner. It is not necessary to compile the
> > > >>>> whole
> > > >>>>>> code
> > > >>>>>>>> base to run a test after switching a branch.
> > > >>>>>>>> cons:
> > > >>>>>>>> - developing libraries features that require changes in the
> > > >> core
> > > >>> /
> > > >>>>> APIs
> > > >>>>>>>> become more time consuming due to back-and-forth between code
> > > >>>> bases.
> > > >>>>>>>> However, I think this is not very often the case.
> > > >>>>>>>>
> > > >>>>>>>> Aljoscha has good points as well. Many of the build issues
> > > >> could
> > > >>> be
> > > >>>>>>> solved
> > > >>>>>>>> by different build profiles and configurations.
> > > >>>>>>>>
> > > >>>>>>>> Best, Fabian
> > > >>>>>>>>
> > > >>>>>>>> 2017-02-22 14:59 GMT+01:00 Gábor Hermann <
> > > >> m...@gaborhermann.com
> > > >>>> :
> > > >>>>>>>>
> > > >>>>>>>>> @Stephan:
> > > >>>>>>>>>
> > > >>>>>>>>> Although I tried to raise some issues about splitting
> > > >>> committers,
> > > >>>>> I'm
> > > >>>>>>>>> still strongly in favor of some kind of restructuring. We
> > > >> just
> > > >>>> have
> > > >>>>>> to
> > > >>>>>>> be
> > > >>>>>>>>> conscious about the disadvantages.
> > > >>>>>>>>>
> > > >>>>>>>>> Not splitting the committers could leave the libraries in the
> > > >>>> same
> > > >>>>>>>>> stalling status, described by Till. Of course, dedicating
> > > >>> current
> > > >>>>>>>>> committers as shepherds of the libraries could easily resolve
> > > >>> the
> > > >>>>>>> issue.
> > > >>>>>>>>> But that requires time from current committers. It seems like
> > > >>>>>>> trade-offs
> > > >>>>>>>>> between code quality, speed of development, and committer
> > > >>>> efforts.
> > > >>>>>>>>>
> > > >>>>>>>>> From what I see in the discussion about ML, there are many
> > > >>> people
> > > >>>>>>> willing
> > > >>>>>>>>> to contribute as well as production use-cases. This means we
> > > >>>> could
> > > >>>>>> and
> > > >>>>>>>>> should move forward. However, the development speed is
> > > >>>>> significantly
> > > >>>>>>>> slowed
> > > >>>>>>>>> down by stalling PRs. The proposal for contributors helping
> > > >> the
> > > >>>>>> review
> > > >>>>>>>>> process did not really work out so far. In my opinion, either
> > > >>>> code
> > > >>>>>>>> quality
> > > >>>>>>>>> (by more easily accepting new committers) or some committer
> > > >>> time
> > > >>>>>>>>> (reviewing/merging) should be sacrificed to move forward. As
> > > >>> Till
> > > >>>>> has
> > > >>>>>>>>> indicated, it would be shameful if we let this contribution
> > > >>>> effort
> > > >>>>>> die.
> > > >>>>>>>>>
> > > >>>>>>>>> Cheers,
> > > >>>>>>>>> Gabor
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Reply via email to