Re: [DISCUSS] Project build time and possible restructuring

Stephan Ewen Tue, 14 Mar 2017 12:05:52 -0700

Some other thoughts on how repository split would help. I am not sure for
all of them, so please comment:


  - There is less competition for a "commit window". It happens a lot
already that you run all tests and want to commit, but there was a commit
in the meantime. You rebase, need to re-test, again commit in the meantime.
    For a "linear" commit history, this may become a bottleneck eventually
as well.

  - There is less risk of broken master. If one repository/modules breaks
its master, the others can still continue.

Stephan


On Fri, Mar 10, 2017 at 12:20 PM, Till Rohrmann <trohrm...@apache.org>
wrote:

> Thanks for all your input. In order to wrap the discussion up I'd like to
> summarize the mentioned points:
>
> The problem of increasing build times and complexity of the project has
> been acknowledged. Ideally we would have everything in one repository using
> an incremental build tool. Since Maven does not properly support this we
> would have to switch our build tool to something like Gradle, for example.
>
> Another option is introducing build profiles for different sets of modules
> as well as separating integration and unit tests. The third alternative
> would be creating sub-projects with their own repositories. I actually
> think that these two proposal are not necessarily exclusive and it would
> also make sense to have a separation between unit and integration tests if
> we split the respository.
>
> The overall consensus seems to be that we don't want to split the community
> and want to keep everything under the same umbrella. I think this is the
> right way to go, because otherwise some parts of the project could become
> second class citizens. Given that and that we continue using Maven, I still
> think that creating sub-projects for the libraries, for example, could be
> beneficial. A split could reduce the project's complexity and make it
> potentially easier for libraries to get actively developed. The main
> concern is setting up the build infrastructure to aggregate docs from
> multiple repositories and making them publicly available.
>
> Since I started this thread and I would really like to see Flink's ML
> library being revived again, I'd volunteer investigating first whether it
> is doable establishing a proper incremental build for Flink. If that should
> not be possible, I will look into splitting the repository, first only for
> the libraries. I'll share my results with the community once I'm done with
> the investigation.
>
> Cheers,
> Till
>
> On Fri, Feb 24, 2017 at 3:50 PM, Robert Metzger <rmetz...@apache.org>
> wrote:
>
> > @Jin Mingjian: You can not use the paid travis version for open source
> > projects. It only works for private repositories (at least back then when
> > we've asked them about that).
> >
> > @Stephan: I don't think that incremental builds will be available with
> > Maven anytime soon.
> >
> > I agree that we need to fix the build time issue on Travis. I've recently
> > pushed a commit to use now three instead of two test groups.
> > But I don't think that this is feasible long-term solution.
> >
> > If this discussion is only about reducing the build and test time,
> > introducing build profiles for different components as Aljoscha suggested
> > would solve the problem Till mentioned.
> > Also, if we decide that travis is not a good tool anymore for the
> testing,
> > I guess we can find a different solution. There are now competitors to
> > Travis that might be willing to offer a paid plan for an open source
> > project, or we set up our own infra on a server sponsored by one of the
> > contributing companies.
> > If we want to solve "community issues" with the change as well, then I
> > think its work the effort of splitting up Flink into different
> > repositories.
> >
> > Splitting up repositories is not a trivial task in my opinion. As others
> > have mentioned before, we need to consider the following things:
> > - How are we doing to build the documentation? Ideally every repo should
> > contain its docs, so we would need to pull them together when building
> the
> > main docs.
> > - How do organize the dependencies? If we have library repository depend
> on
> > snapshot Flink versions, we need to make sure that the snapshot
> deployment
> > always works. This also means that people working on a library repository
> > will pull from snapshot OR need to build first locally.
> > - We need to update the release scripts
> >
> > If we commit to do these changes, we need to assign at least one
> committer
> > (yes, in this case we need somebody who can commit, for example for
> > updating the buildbot stuff) who volunteers to do the change.
> > I've done a lot of infrastructure work in the past, but I'm currently
> > pretty booked with many other things, so I don't realistically see myself
> > doing that. Max who used to work on these things is taking some time off.
> > I think we need, best case 3 days for the change, worst case 5 days. The
> > problem is that there are no "unit tests" for the infra stuff, so many
> > things are "trial and error" (like Apache's buildbot, our release
> scripts,
> > the doc scripts, maven stuff, nightly builds).
> >
> >
> >
> > On Thu, Feb 23, 2017 at 1:33 PM, Stephan Ewen <se...@apache.org> wrote:
> >
> > > If we can get a incremental builds to work, that would actually be the
> > > preferred solution in my opinion.
> > >
> > > Many companies have invested heavily in making a "single repository"
> code
> > > base work, because it has the advantage of not having to update/publish
> > > several repositories first.
> > > However, the strong prerequisite for that is an incremental build
> system
> > > that builds only (fine grained) what it has to build. I am not sure how
> > we
> > > could make that work
> > > with Maven and Travis...
> > >
> > > On Wed, Feb 22, 2017 at 10:42 PM, Greg Hogan <c...@greghogan.com>
> wrote:
> > >
> > > > An additional option for reducing time to build and test is parallel
> > > > execution. This would help users more than on TravisCI since we're
> > > > generally running on multi-core machines rather than VM slices.
> > > >
> > > > Is the idea that each user would only check out the modules that he
> or
> > > she
> > > > is developing with? For example, if a developer is not working on
> > > > flink-mesos or flink-yarn then the "flink-deploy" module would not be
> > > clone
> > > > to their filesystem?
> > > >
> > > > We can run a TravisCI nightly build on each repo to validate against
> > API
> > > > changes.
> > > >
> > > > Greg
> > > >
> > > > On Wed, Feb 22, 2017 at 12:24 PM, Fabian Hueske <fhue...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi everybody,
> > > > >
> > > > > I think this should be a discussion about the benefits and
> drawbacks
> > of
> > > > > separating the code into distinct repositories from a development
> > point
> > > > of
> > > > > view.
> > > > > So I agree with Stephan that we should not divide the community by
> > > > creating
> > > > > separate groups of committers.
> > > > > Also the discussion about independent releases is not be strictly
> > > related
> > > > > to the decision, IMO.
> > > > >
> > > > > I see a few pros and cons for splitting the code base into separate
> > > > > repositories which (I think) haven't been mentioned before:
> > > > > pros:
> > > > > - IDE setup will be leaner. It is not necessary to compile the
> whole
> > > code
> > > > > base to run a test after switching a branch.
> > > > > cons:
> > > > > - developing libraries features that require changes in the core /
> > APIs
> > > > > become more time consuming due to back-and-forth between code
> bases.
> > > > > However, I think this is not very often the case.
> > > > >
> > > > > Aljoscha has good points as well. Many of the build issues could be
> > > > solved
> > > > > by different build profiles and configurations.
> > > > >
> > > > > Best, Fabian
> > > > >
> > > > > 2017-02-22 14:59 GMT+01:00 Gábor Hermann <m...@gaborhermann.com>:
> > > > >
> > > > > > @Stephan:
> > > > > >
> > > > > > Although I tried to raise some issues about splitting committers,
> > I'm
> > > > > > still strongly in favor of some kind of restructuring. We just
> have
> > > to
> > > > be
> > > > > > conscious about the disadvantages.
> > > > > >
> > > > > > Not splitting the committers could leave the libraries in the
> same
> > > > > > stalling status, described by Till. Of course, dedicating current
> > > > > > committers as shepherds of the libraries could easily resolve the
> > > > issue.
> > > > > > But that requires time from current committers. It seems like
> > > > trade-offs
> > > > > > between code quality, speed of development, and committer
> efforts.
> > > > > >
> > > > > > From what I see in the discussion about ML, there are many people
> > > > willing
> > > > > > to contribute as well as production use-cases. This means we
> could
> > > and
> > > > > > should move forward. However, the development speed is
> > significantly
> > > > > slowed
> > > > > > down by stalling PRs. The proposal for contributors helping the
> > > review
> > > > > > process did not really work out so far. In my opinion, either
> code
> > > > > quality
> > > > > > (by more easily accepting new committers) or some committer time
> > > > > > (reviewing/merging) should be sacrificed to move forward. As Till
> > has
> > > > > > indicated, it would be shameful if we let this contribution
> effort
> > > die.
> > > > > >
> > > > > > Cheers,
> > > > > > Gabor
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Project build time and possible restructuring

Reply via email to