Just in case we decide to pursue the repo split in the end, some thoughts
on Chesnay's questions:

(1) Git History

We can also use "git filter-branch" to rewrite the history to only contain
the connectors.
It changes commit hashes, but not sure that this is a problem. The commit
hashes are still valid in the main repo, so one can look up the commits
that fixed an earlier issue.

(2) Maven

+1 to a shared flink-parent pom.xml file

(3) Docs

One option would be to not integrate the docs.
That would mean a top level navigation between Flink, Connectors, Libraries
(for example as a horizontal bar at the top) and then per repository
navigation as we currently have it.
Of course, sharing docs build setup would be desirable.

(4) End-2-End tests

I think we absolutely need those on the other repos.
As Piotr pointed out, some of the end to end test coverage depends on
connectors and libraries.

While ideally that would not be necessary, I believe that realistically,
targeted test coverage in the core will never absolutely perfect. So a
certain amount of additional coverage (especially for bugs due to
distributed race conditions) will be caught by the extended test coverage
we get from connector and library end-to-end tests.

  Let's find a way to keep that, maybe not as per-commit tests, but as
nightly ones.

On Wed, Aug 7, 2019 at 1:14 PM Chesnay Schepler <ches...@apache.org> wrote:

> Hello everyone,
>
> The Flink project sees an ever-increasing amount of dev activity, both
> in terms of reworked and new features.
>
> This is of course an excellent situation to be in, but we are getting to
> a point where the associate downsides are becoming increasingly
> troublesome.
>
> The ever increasing build times, in addition to unstable tests,
> significantly slow down the develoment process.
> Additionally, pull requests for smaller features frequently slip through
> the crasks as they are being buried under a mountain of other pull
> requests.
>
> As a result I'd like to start a discussion on splitting the Flink
> repository.
>
> In this mail I will outline the core idea, and what problems I currently
> envision.
>
> I'd specifically like to encourage those who were part of similar
> initiatives in other projects to share the experiences and ideas.
>
>
>         General Idea
>
> For starters, the idea is to create a new repository for
> "flink-connectors".
> For the remainder of this mail, the current Flink repository is referred
> to as "flink-main".
>
> There are also other candidates that we could discuss in the future,
> like flink-libraries (the next top-priority repo to ease flink-ml
> development), metric reporters, filesystems and flink-formats.
>
> Moving out flink-connectors provides the most benefits, as we straight
> away save at-least an hour of testing time, and not being included in
> the binary distribution simplifies a few things.
>
>
>         Problems to solve
>
> To make this a reality there's a number of questions we have to discuss;
> some in the short-term, others in the long-term.
>
> 1) Git history
>
>     We have to decide whether we want to rewrite the history of sub
>     repositories to only contain diffs/commits related to this part of
>     Flink, or whether we just fork from some commit in flink-main and
>     add a commit to the connector repo that "transforms" it from
>     flink-main to flink-connectors (i.e., remove everything unrelated to
>     connectors + update module structure etc.).
>
>     The latter option would have the advantage that our commit book
>     keeping in JIRA would still be correct, but it would create a
>     significant divide between the current and past state of the
> repository.
>
> 2) Maven
>
>     We should look into whether there's a way to share dependency/plugin
>     configurations and similar, so we don't have to keep them in sync
>     manually across multiple repositories.
>
>     A new parent Flink pom that all repositories define as their parent
>     could work; this would imply splicing out part of the current room
>     pom.xml.
>
> 3) Documentation
>
>     Splitting the repository realistically also implies splitting the
>     documentation source files (At the beginning we can get by with
>     having it still in flink-main).
>     We could just move the relevant files to the respective repository
>     (while maintaining the directory structure), and merge them when
>     building the docs.
>
>     We also have to look at how we can handle java-/scaladocs; e.g.
>     whether it is possible to aggregate them across projects.
>
> 4) CI (end-to-end tests)
>
>     The very basic question we have to answer is whether we want E2E
>     tests in the sub repositories. If so, we need to find a way to share
>     e2e-tooling.
>
> 5) Releases
>
>     We have to discuss how our release process will look like. This may
>     also have repercussions on how repositories may depend on each other
>     (SNAPSHOT vs LATEST). Note that this should be discussed for each
>     repo separately.
>
>     The current options I see are the following:
>
>     a) Single release
>
>         Release all repositories at once as a single product.
>
>         The source release would be a collection of repositories, like
>         flink/
>         |--flink-main/
>             |--flink-core/
>             |--flink-runtime/
>             ...
>         |--flink-connectors/
>             ...
>         |--flink-.../
>         ...
>
>         This option requires a SNAPSHOT dependency between Flink
>         repositories, but it is pretty much how things work at the moment.
>
>     b) Synced releases
>
>         Similar to a), except that each repository gets their own source
>         release that they may released independent of other repositories.
>         For a given release cycle each repo would produce exactly one
>         release.
>
>         This option requires a SNAPSHOT dependency between Flink
>         repositories. Once any repositories has created an RC or
>         finished it's release, release-branches in other repos can
>         switch to that version.
>
>         This approach is a tad more flexible than a), but requires more
>         coordination between the repos.
>
>     c) Separate releases
>
>         Just like we handle flink-shaded; entirely separate release
>         cycles; some repositories may have more releases in a given time
>         period than others.
>
>         This option implies a LATEST dependency between Flink repositories.
>
>     Note that hybrid approaches would also make sense, like doing b) for
>     major versions and c) for bugfix releases.
>
>     For something like flink-libraries this question may also have
>     repercussions on how/whether they are bundled in the distribution;
>     options a)/b) would maintain the status-quo, c) and hybrid
>     approaches will likely necessitate the exclusion from the distribution.
>
>

Reply via email to