Re: [DISCUSS] Creating an external connector repository

Arvid Heise Fri, 19 Nov 2021 00:12:34 -0800

Hi everyone,

we are currently in the process of setting up the flink-connectors repo [1]
for new connectors but we hit a wall that we currently cannot take:
branching model.
To reiterate the original motivation of the external connector repo: We
want to decouple the release cycle of a connector with Flink. However, if
we want to support semantic versioning in the connectors with the ability
to introduce breaking changes through major version bumps and support
bugfixes on old versions, then we need release branches similar to how
Flink core operates.
Consider two connectors, let's call them kafka and hbase. We have kafka in
version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option) change and
hbase only on 1.0.A.


Now our current assumption was that we can work with a mono-repo under ASF
(flink-connectors). Then, for release-branches, we found 3 options:
1. We would need to create some ugly mess with the cross product of
connector and version: so you have kafka-release-1.0, kafka-release-1.1,
kafka-release-2.0, hbase-release-1.0. The main issue is not the amount of
branches (that's something that git can handle) but there the state of
kafka is undefined in hbase-release-1.0. That's a call for desaster and
makes releasing connectors very cumbersome (CI would only execute and
publish hbase SNAPSHOTS on hbase-release-1.0).
2. We could avoid the undefined state by having an empty master and each
release branch really only holds the code of the connector. But that's also
not great: any user that looks at the repo and sees no connector would
assume that it's dead.
3. We could have synced releases similar to the CDC connectors [2]. That
means that if any connector introduces a breaking change, all connectors
get a new major. I find that quite confusing to a user if hbase gets a new
release without any change because kafka introduced a breaking change.

To fully decouple release cycles and CI of connectors, we could add
individual repositories under ASF (flink-connector-kafka,
flink-connector-hbase). Then we can apply the same branching model as
before. I quickly checked if there are precedences in the apache community
for that approach and just by scanning alphabetically I found cordova with
70 and couchdb with 77 apache repos respectively. So it certainly seems
like other projects approached our problem in that way and the apache
organization is okay with that. I currently expect max 20 additional repos
for connectors and in the future 10 max each for formats and filesystems if
we would also move them out at some point in time. So we would be at a
total of 50 repos.

Note for all options, we need to provide a compability matrix that we aim
to autogenerate.

Now for the potential downsides that we internally discussed:
- How can we ensure common infra structure code, utilties, and quality?
I propose to add a flink-connector-common that contains all these things
and is added as a git submodule/subtree to the repos.
- Do we implicitly discourage connector developers to maintain more than
one connector with a fragmented code base?
That is certainly a risk. However, I currently also see few devs working on
more than one connector. However, it may actually help keeping the devs
that maintain a specific connector on the hook. We could use github issues
to track bugs and feature requests and a dev can focus his limited time on
getting that one connector right.

So WDYT? Compared to some intermediate suggestions with split repos, the
big difference is that everything remains under Apache umbrella and the
Flink community.

[1] https://github.com/apache/flink-connectors
[2] https://github.com/ververica/flink-cdc-connectors/

On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise <ar...@apache.org> wrote:

> Hi everyone,
>
> I created the flink-connectors repo [1] to advance the topic. We would
> create a proof-of-concept in the next few weeks as a special branch that
> I'd then use for discussions. If the community agrees with the approach,
> that special branch will become the master. If not, we can reiterate over
> it or create competing POCs.
>
> If someone wants to try things out in parallel, just make sure that you
> are not accidentally pushing POCs to the master.
>
> As a reminder: We will not move out any current connector from Flink at
> this point in time, so everything in Flink will remain as is and be
> maintained there.
>
> Best,
>
> Arvid
>
> [1] https://github.com/apache/flink-connectors
>
> On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann <trohrm...@apache.org>
> wrote:
>
>> Hi everyone,
>>
>> From the discussion, it seems to me that we have different opinions
>> whether to have an ASF umbrella repository or to host them outside of the
>> ASF. It also seems that this is not really the problem to solve. Since
>> there are many good arguments for either approach, we could simply start
>> with an ASF umbrella repository and see how people adopt it. If the
>> individual connectors cannot move fast enough or if people prefer to not
>> buy into the more heavy-weight ASF processes, then they can host the code
>> also somewhere else. We simply need to make sure that these connectors are
>> discoverable (e.g. via flink-packages).
>>
>> The more important problem seems to be to provide common tooling (testing,
>> infrastructure, documentation) that can easily be reused. Similarly, it
>> has
>> become clear that the Flink community needs to improve on providing stable
>> APIs. I think it is not realistic to first complete these tasks before
>> starting to move connectors to dedicated repositories. As Stephan said,
>> creating a connector repository will force us to pay more attention to API
>> stability and also to think about which testing tools are required. Hence,
>> I believe that starting to add connectors to a different repository than
>> apache/flink will help improve our connector tooling (declaring testing
>> classes as public, creating a common test utility repo, creating a repo
>> template) and vice versa. Hence, I like Arvid's proposed process as it
>> will
>> start kicking things off w/o letting this effort fizzle out.
>>
>> Cheers,
>> Till
>>
>> On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen <se...@apache.org> wrote:
>>
>> > Thank you all, for the nice discussion!
>> >
>> > From my point of view, I very much like the idea of putting connectors
>> in a
>> > separate repository. But I would argue it should be part of Apache
>> Flink,
>> > similar to flink-statefun, flink-ml, etc.
>> >
>> > I share many of the reasons for that:
>> >   - As argued many times, reduces complexity of the Flink repo,
>> increases
>> > response times of CI, etc.
>> >   - Much lower barrier of contribution, because an unstable connector
>> would
>> > not de-stabilize the whole build. Of course, we would need to make sure
>> we
>> > set this up the right way, with connectors having individual CI runs,
>> build
>> > status, etc. But it certainly seems possible.
>> >
>> >
>> > I would argue some points a bit different than some cases made before:
>> >
>> > (a) I believe the separation would increase connector stability.
>> Because it
>> > really forces us to work with the connectors against the APIs like any
>> > external developer. A mono repo is somehow the wrong thing if you in
>> > practice want to actually guarantee stable internal APIs at some layer.
>> > Because the mono repo makes it easy to just change something on both
>> sides
>> > of the API (provider and consumer) seamlessly.
>> >
>> > Major refactorings in Flink need to keep all connector API contracts
>> > intact, or we need to have a new version of the connector API.
>> >
>> > (b) We may even be able to go towards more lightweight and automated
>> > releases over time, even if we stay in Apache Flink with that repo.
>> > This isn't yet fully aligned with the Apache release policies, yet, but
>> > there are board discussions about whether there can be bot-triggered
>> > releases (by dependabot) and how that could fit into the Apache process.
>> >
>> > This doesn't seem to be quite there just yet, but seeing that those
>> start
>> > is a good sign, and there is a good chance we can do some things there.
>> > I am not sure whether we should let bots trigger releases, because a
>> final
>> > human look at things isn't a bad thing, especially given the popularity
>> of
>> > software supply chain attacks recently.
>> >
>> >
>> > I do share Chesnay's concerns about complexity in tooling, though. Both
>> > release tooling and test tooling. They are not incompatible with that
>> > approach, but they are a task we need to tackle during this change which
>> > will add additional work.
>> >
>> >
>> >
>> > On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise <ar...@apache.org> wrote:
>> >
>> > > Hi folks,
>> > >
>> > > I think some questions came up and I'd like to address the question of
>> > the
>> > > timing.
>> > >
>> > > Could you clarify what release cadence you're thinking of? There's
>> quite
>> > > > a big range that fits "more frequent than Flink" (per-commit, daily,
>> > > > weekly, bi-weekly, monthly, even bi-monthly).
>> > >
>> > > The short answer is: as often as needed:
>> > > - If there is a CVE in a dependency and we need to bump it - release
>> > > immediately.
>> > > - If there is a new feature merged, release soonish. We may collect a
>> few
>> > > successive features before a release.
>> > > - If there is a bugfix, release immediately or soonish depending on
>> the
>> > > severity and if there are workarounds available.
>> > >
>> > > We should not limit ourselves; the whole idea of independent releases
>> is
>> > > exactly that you release as needed. There is no release planning or
>> > > anything needed, you just go with a release as if it was an external
>> > > artifact.
>> > >
>> > > (1) is the connector API already stable?
>> > > > From another discussion thread [1], connector API is far from
>> stable.
>> > > > Currently, it's hard to build connectors against multiple Flink
>> > versions.
>> > > > There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14
>> > and
>> > > >  maybe also in the future versions,  because Table related APIs are
>> > still
>> > > > @PublicEvolving and new Sink API is still @Experimental.
>> > > >
>> > >
>> > > The question is: what is stable in an evolving system? We recently
>> > > discovered that the old SourceFunction needed to be refined such that
>> > > cancellation works correctly [1]. So that interface is in Flink since
>> 7
>> > > years, heavily used also outside, and we still had to change the
>> contract
>> > > in a way that I'd expect any implementer to recheck their
>> implementation.
>> > > It might not be necessary to change anything and you can probably
>> change
>> > > the the code for all Flink versions but still, the interface was not
>> > stable
>> > > in the closest sense.
>> > >
>> > > If we focus just on API changes on the unified interfaces, then we
>> expect
>> > > one more change to Sink API to support compaction. For Table API,
>> there
>> > > will most likely also be some changes in 1.15. So we could wait for
>> 1.15.
>> > >
>> > > But I'm questioning if that's really necessary because we will add
>> more
>> > > functionality beyond 1.15 without breaking API. For example, we may
>> add
>> > > more unified connector metrics. If you want to use it in your
>> connector,
>> > > you have to support multiple Flink versions anyhow. So rather then
>> > focusing
>> > > the discussion on "when is stuff stable", I'd rather focus on "how
>> can we
>> > > support building connectors against multiple Flink versions" and make
>> it
>> > as
>> > > painless as possible.
>> > >
>> > > Chesnay pointed out to use different branches for different Flink
>> > versions
>> > > which sounds like a good suggestion. With a mono-repo, we can't use
>> > > branches differently anyways (there is no way to have release branches
>> > per
>> > > connector without chaos). In these branches, we could provide shims to
>> > > simulate future features in older Flink versions such that code-wise,
>> the
>> > > source code of a specific connector may not diverge (much). For
>> example,
>> > to
>> > > register unified connector metrics, we could simulate the current
>> > approach
>> > > also in some utility package of the mono-repo.
>> > >
>> > > I see the stable core Flink API as a prerequisite for modularity. And
>> > > > for connectors it is not just the source and sink API (source being
>> > > > stable as of 1.14), but everything that is required to build and
>> > > > maintain a connector downstream, such as the test utilities and
>> > > > infrastructure.
>> > > >
>> > >
>> > > That is a very fair point. I'm actually surprised to see that
>> > > MiniClusterWithClientResource is not public. I see it being used in
>> all
>> > > connectors, especially outside of Flink. I fear that as long as we do
>> not
>> > > have connectors outside, we will not properly annotate and maintain
>> these
>> > > utilties in a classic hen-and-egg-problem. I will outline an idea at
>> the
>> > > end.
>> > >
>> > > > the connectors need to be adopted and require at least one release
>> per
>> > > > Flink minor release.
>> > > > However, this will make the releases of connectors slower, e.g.
>> > maintain
>> > > > features for multiple branches and release multiple branches.
>> > > > I think the main purpose of having an external connector repository
>> is
>> > in
>> > > > order to have "faster releases of connectors"?
>> > > >
>> > >
>> > > > Imagine a project with a complex set of dependencies. Let's say
>> Flink
>> > > > version A plus Flink reliant dependencies released by other projects
>> > > > (Flink-external connectors, Beam, Iceberg, Hudi, ..). We don't want
>> a
>> > > > situation where we bump the core Flink version to B and things fall
>> > > > apart (interface changes, utilities that were useful but not public,
>> > > > transitive dependencies etc.).
>> > > >
>> > >
>> > > Yes, that's why I wanted to automate the processes more which is not
>> that
>> > > easy under ASF. Maybe we automate the source provision across
>> supported
>> > > versions and have 1 vote thread for all versions of a connector?
>> > >
>> > > From the perspective of CDC connector maintainers, the biggest
>> advantage
>> > of
>> > > > maintaining it outside of the Flink project is that:
>> > > > 1) we can have a more flexible and faster release cycle
>> > > > 2) we can be more liberal with committership for connector
>> maintainers
>> > > > which can also attract more committers to help the release.
>> > > >
>> > > > Personally, I think maintaining one connector repository under the
>> ASF
>> > > may
>> > > > not have the above benefits.
>> > > >
>> > >
>> > > Yes, I also feel that ASF is too restrictive for our needs. But it
>> feels
>> > > like there are too many that see it differently and I think we need
>> > >
>> > > (2) Flink testability without connectors.
>> > > > This is a very good question. How can we guarantee the new Source
>> and
>> > > Sink
>> > > > API are stable with only test implementation?
>> > > >
>> > >
>> > > We can't and shouldn't. Since the connector repo is managed by Flink,
>> a
>> > > Flink release manager needs to check if the Flink connectors are
>> actually
>> > > working prior to creating an RC. That's similar to how flink-shaded
>> and
>> > > flink core are related.
>> > >
>> > >
>> > > So here is one idea that I had to get things rolling. We are going to
>> > > address the external repo iteratively without compromising what we
>> > already
>> > > have:
>> > > 1.Phase, add new contributions to external repo. We use that time to
>> > setup
>> > > infra accordingly and optimize release processes. We will identify
>> test
>> > > utilities that are not yet public/stable and fix that.
>> > > 2.Phase, add ports to the new unified interfaces of existing
>> connectors.
>> > > That requires a previous Flink release to make utilities stable. Keep
>> old
>> > > interfaces in flink-core.
>> > > 3.Phase, remove old interfaces in flink-core of some connectors (tbd
>> at a
>> > > later point).
>> > > 4.Phase, optionally move all remaining connectors (tbd at a later
>> point).
>> > >
>> > > I'd envision having ~3 months between the starting the different
>> phases.
>> > > WDYT?
>> > >
>> > >
>> > > [1] https://issues.apache.org/jira/browse/FLINK-23527
>> > >
>> > > On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson <k...@tabular.io>
>> wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > My name is Kyle and I’m an open source developer primarily focused
>> on
>> > > > Apache Iceberg.
>> > > >
>> > > > I’m happy to help clarify or elaborate on any aspect of our
>> experience
>> > > > working on a relatively decoupled connector that is downstream and
>> > pretty
>> > > > popular.
>> > > >
>> > > > I’d also love to be able to contribute or assist in any way I can.
>> > > >
>> > > > I don’t mean to thread jack, but are there any meetings or community
>> > sync
>> > > > ups, specifically around the connector APIs, that I might join / be
>> > > invited
>> > > > to?
>> > > >
>> > > > I did want to add that even though I’ve experienced some of the pain
>> > > points
>> > > > of integrating with an evolving system / API (catalog support is
>> > > generally
>> > > > speaking pretty new everywhere really in this space), I also agree
>> > > > personally that you shouldn’t slow down development velocity too
>> much
>> > for
>> > > > the sake of external connector. Getting to a performant and stable
>> > place
>> > > > should be the primary goal, and slowing that down to support
>> stragglers
>> > > > will (in my personal opinion) always be a losing game. Some folks
>> will
>> > > > simply stay behind on versions regardless until they have to
>> upgrade.
>> > > >
>> > > > I am working on ensuring that the Iceberg community stays within 1-2
>> > > > versions of Flink, so that we can help provide more feedback or
>> > > contribute
>> > > > things that might make our ability to support multiple Flink
>> runtimes /
>> > > > versions with one project / codebase and minimal to no reflection
>> (our
>> > > > desired goal).
>> > > >
>> > > > If there’s anything I can do or any way I can be of assistance,
>> please
>> > > > don’t hesitate to reach out. Or find me on ASF slack 😀
>> > > >
>> > > > I greatly appreciate your general concern for the needs of
>> downstream
>> > > > connector integrators!
>> > > >
>> > > > Cheers
>> > > > Kyle Bendickson (GitHub: kbendick)
>> > > > Open Source Developer
>> > > > kyle [at] tabular [dot] io
>> > > >
>> > > > On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <t...@apache.org>
>> wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > I see the stable core Flink API as a prerequisite for modularity.
>> And
>> > > > > for connectors it is not just the source and sink API (source
>> being
>> > > > > stable as of 1.14), but everything that is required to build and
>> > > > > maintain a connector downstream, such as the test utilities and
>> > > > > infrastructure.
>> > > > >
>> > > > > Without the stable surface of core Flink, changes will leak into
>> > > > > downstream dependencies and force lock step updates. Refactoring
>> > > > > across N repos is more painful than a single repo. Those with
>> > > > > experience developing downstream of Flink will know the pain, and
>> > that
>> > > > > isn't limited to connectors. I don't remember a Flink "minor
>> version"
>> > > > > update that was just a dependency version change and did not force
>> > > > > other downstream changes.
>> > > > >
>> > > > > Imagine a project with a complex set of dependencies. Let's say
>> Flink
>> > > > > version A plus Flink reliant dependencies released by other
>> projects
>> > > > > (Flink-external connectors, Beam, Iceberg, Hudi, ..). We don't
>> want a
>> > > > > situation where we bump the core Flink version to B and things
>> fall
>> > > > > apart (interface changes, utilities that were useful but not
>> public,
>> > > > > transitive dependencies etc.).
>> > > > >
>> > > > > The discussion here also highlights the benefits of keeping
>> certain
>> > > > > connectors outside Flink. Whether that is due to difference in
>> > > > > developer community, maturity of the connectors, their
>> > > > > specialized/limited usage etc. I would like to see that as a sign
>> of
>> > a
>> > > > > growing ecosystem and most of the ideas that Arvid has put forward
>> > > > > would benefit further growth of the connector ecosystem.
>> > > > >
>> > > > > As for keeping connectors within Apache Flink: I prefer that as
>> the
>> > > > > path forward for "essential" connectors like FileSource,
>> KafkaSource,
>> > > > > ... And we can still achieve a more flexible and faster release
>> > cycle.
>> > > > >
>> > > > > Thanks,
>> > > > > Thomas
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <imj...@gmail.com> wrote:
>> > > > > >
>> > > > > > Hi Konstantin,
>> > > > > >
>> > > > > > > the connectors need to be adopted and require at least one
>> > release
>> > > > per
>> > > > > > Flink minor release.
>> > > > > > However, this will make the releases of connectors slower, e.g.
>> > > > maintain
>> > > > > > features for multiple branches and release multiple branches.
>> > > > > > I think the main purpose of having an external connector
>> repository
>> > > is
>> > > > in
>> > > > > > order to have "faster releases of connectors"?
>> > > > > >
>> > > > > >
>> > > > > > From the perspective of CDC connector maintainers, the biggest
>> > > > advantage
>> > > > > of
>> > > > > > maintaining it outside of the Flink project is that:
>> > > > > > 1) we can have a more flexible and faster release cycle
>> > > > > > 2) we can be more liberal with committership for connector
>> > > maintainers
>> > > > > > which can also attract more committers to help the release.
>> > > > > >
>> > > > > > Personally, I think maintaining one connector repository under
>> the
>> > > ASF
>> > > > > may
>> > > > > > not have the above benefits.
>> > > > > >
>> > > > > > Best,
>> > > > > > Jark
>> > > > > >
>> > > > > > On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf <
>> kna...@apache.org>
>> > > > > wrote:
>> > > > > >
>> > > > > > > Hi everyone,
>> > > > > > >
>> > > > > > > regarding the stability of the APIs. I think everyone agrees
>> that
>> > > > > > > connector APIs which are stable across minor versions
>> > (1.13->1.14)
>> > > > are
>> > > > > the
>> > > > > > > mid-term goal. But:
>> > > > > > >
>> > > > > > > a) These APIs are still quite young, and we shouldn't make
>> them
>> > > > @Public
>> > > > > > > prematurely either.
>> > > > > > >
>> > > > > > > b) Isn't this *mostly* orthogonal to where the connector code
>> > > lives?
>> > > > > Yes,
>> > > > > > > as long as there are breaking changes, the connectors need to
>> be
>> > > > > adopted
>> > > > > > > and require at least one release per Flink minor release.
>> > > > > > > Documentation-wise this can be addressed via a compatibility
>> > matrix
>> > > > for
>> > > > > > > each connector as Arvid suggested. IMO we shouldn't block this
>> > > effort
>> > > > > on
>> > > > > > > the stability of the APIs.
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > >
>> > > > > > > Konstantin
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Wed, Oct 20, 2021 at 8:56 AM Jark Wu <imj...@gmail.com>
>> > wrote:
>> > > > > > >
>> > > > > > >> Hi,
>> > > > > > >>
>> > > > > > >> I think Thomas raised very good questions and would like to
>> know
>> > > > your
>> > > > > > >> opinions if we want to move connectors out of flink in this
>> > > version.
>> > > > > > >>
>> > > > > > >> (1) is the connector API already stable?
>> > > > > > >> > Separate releases would only make sense if the core Flink
>> > > surface
>> > > > is
>> > > > > > >> > fairly stable though. As evident from Iceberg (and also
>> Beam),
>> > > > > that's
>> > > > > > >> > not the case currently. We should probably focus on
>> addressing
>> > > the
>> > > > > > >> > stability first, before splitting code. A success criteria
>> > could
>> > > > be
>> > > > > > >> > that we are able to build Iceberg and Beam against multiple
>> > > Flink
>> > > > > > >> > versions w/o the need to change code. The goal would be
>> that
>> > no
>> > > > > > >> > connector breaks when we make changes to Flink core. Until
>> > > that's
>> > > > > the
>> > > > > > >> > case, code separation creates a setup where 1+1 or N+1
>> > > > repositories
>> > > > > > >> > need to move lock step.
>> > > > > > >>
>> > > > > > >> From another discussion thread [1], connector API is far from
>> > > > stable.
>> > > > > > >> Currently, it's hard to build connectors against multiple
>> Flink
>> > > > > versions.
>> > > > > > >> There are breaking API changes both in 1.12 -> 1.13 and 1.13
>> ->
>> > > 1.14
>> > > > > and
>> > > > > > >>  maybe also in the future versions,  because Table related
>> APIs
>> > > are
>> > > > > still
>> > > > > > >> @PublicEvolving and new Sink API is still @Experimental.
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> (2) Flink testability without connectors.
>> > > > > > >> > Flink w/o Kafka connector (and few others) isn't
>> > > > > > >> > viable. Testability of Flink was already brought up, can we
>> > > really
>> > > > > > >> > certify a Flink core release without Kafka connector? Maybe
>> > > those
>> > > > > > >> > connectors that are used in Flink e2e tests to validate
>> > > > > functionality
>> > > > > > >> > of core Flink should not be broken out?
>> > > > > > >>
>> > > > > > >> This is a very good question. How can we guarantee the new
>> > Source
>> > > > and
>> > > > > Sink
>> > > > > > >> API are stable with only test implementation?
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> Best,
>> > > > > > >> Jark
>> > > > > > >>
>> > > > > > >>
>> > > > > > >>
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler <
>> > > ches...@apache.org>
>> > > > > > >> wrote:
>> > > > > > >>
>> > > > > > >> > Could you clarify what release cadence you're thinking of?
>> > > There's
>> > > > > quite
>> > > > > > >> > a big range that fits "more frequent than Flink"
>> (per-commit,
>> > > > daily,
>> > > > > > >> > weekly, bi-weekly, monthly, even bi-monthly).
>> > > > > > >> >
>> > > > > > >> > On 19/10/2021 14:15, Martijn Visser wrote:
>> > > > > > >> > > Hi all,
>> > > > > > >> > >
>> > > > > > >> > > I think it would be a huge benefit if we can achieve more
>> > > > frequent
>> > > > > > >> > releases
>> > > > > > >> > > of connectors, which are not bound to the release cycle
>> of
>> > > Flink
>> > > > > > >> itself.
>> > > > > > >> > I
>> > > > > > >> > > agree that in order to get there, we need to have stable
>> > > > > interfaces
>> > > > > > >> which
>> > > > > > >> > > are trustworthy and reliable, so they can be safely used
>> by
>> > > > those
>> > > > > > >> > > connectors. I do think that work still needs to be done
>> on
>> > > those
>> > > > > > >> > > interfaces, but I am confident that we can get there
>> from a
>> > > > Flink
>> > > > > > >> > > perspective.
>> > > > > > >> > >
>> > > > > > >> > > I am worried that we would not be able to achieve those
>> > > frequent
>> > > > > > >> releases
>> > > > > > >> > > of connectors if we are putting these connectors under
>> the
>> > > > Apache
>> > > > > > >> > umbrella,
>> > > > > > >> > > because that means that for each connector release we
>> have
>> > to
>> > > > > follow
>> > > > > > >> the
>> > > > > > >> > > Apache release creation process. This requires a lot of
>> > manual
>> > > > > steps
>> > > > > > >> and
>> > > > > > >> > > prohibits automation and I think it would be hard to
>> scale
>> > out
>> > > > > > >> frequent
>> > > > > > >> > > releases of connectors. I'm curious how others think this
>> > > > > challenge
>> > > > > > >> could
>> > > > > > >> > > be solved.
>> > > > > > >> > >
>> > > > > > >> > > Best regards,
>> > > > > > >> > >
>> > > > > > >> > > Martijn
>> > > > > > >> > >
>> > > > > > >> > > On Mon, 18 Oct 2021 at 22:22, Thomas Weise <
>> t...@apache.org>
>> > > > > wrote:
>> > > > > > >> > >
>> > > > > > >> > >> Thanks for initiating this discussion.
>> > > > > > >> > >>
>> > > > > > >> > >> There are definitely a few things that are not optimal
>> with
>> > > our
>> > > > > > >> > >> current management of connectors. I would not
>> necessarily
>> > > > > > >> characterize
>> > > > > > >> > >> it as a "mess" though. As the points raised so far
>> show, it
>> > > > isn't
>> > > > > > >> easy
>> > > > > > >> > >> to find a solution that balances competing requirements
>> and
>> > > > > leads to
>> > > > > > >> a
>> > > > > > >> > >> net improvement.
>> > > > > > >> > >>
>> > > > > > >> > >> It would be great if we can find a setup that allows for
>> > > > > connectors
>> > > > > > >> to
>> > > > > > >> > >> be released independently of core Flink and that each
>> > > connector
>> > > > > can
>> > > > > > >> be
>> > > > > > >> > >> released separately. Flink already has separate releases
>> > > > > > >> > >> (flink-shaded), so that by itself isn't a new thing.
>> > > > > Per-connector
>> > > > > > >> > >> releases would need to allow for more frequent releases
>> > > > (without
>> > > > > the
>> > > > > > >> > >> baggage that a full Flink release comes with).
>> > > > > > >> > >>
>> > > > > > >> > >> Separate releases would only make sense if the core
>> Flink
>> > > > > surface is
>> > > > > > >> > >> fairly stable though. As evident from Iceberg (and also
>> > > Beam),
>> > > > > that's
>> > > > > > >> > >> not the case currently. We should probably focus on
>> > > addressing
>> > > > > the
>> > > > > > >> > >> stability first, before splitting code. A success
>> criteria
>> > > > could
>> > > > > be
>> > > > > > >> > >> that we are able to build Iceberg and Beam against
>> multiple
>> > > > Flink
>> > > > > > >> > >> versions w/o the need to change code. The goal would be
>> > that
>> > > no
>> > > > > > >> > >> connector breaks when we make changes to Flink core.
>> Until
>> > > > > that's the
>> > > > > > >> > >> case, code separation creates a setup where 1+1 or N+1
>> > > > > repositories
>> > > > > > >> > >> need to move lock step.
>> > > > > > >> > >>
>> > > > > > >> > >> Regarding some connectors being more important for Flink
>> > than
>> > > > > others:
>> > > > > > >> > >> That's a fact. Flink w/o Kafka connector (and few
>> others)
>> > > isn't
>> > > > > > >> > >> viable. Testability of Flink was already brought up,
>> can we
>> > > > > really
>> > > > > > >> > >> certify a Flink core release without Kafka connector?
>> Maybe
>> > > > those
>> > > > > > >> > >> connectors that are used in Flink e2e tests to validate
>> > > > > functionality
>> > > > > > >> > >> of core Flink should not be broken out?
>> > > > > > >> > >>
>> > > > > > >> > >> Finally, I think that the connectors that move into
>> > separate
>> > > > > repos
>> > > > > > >> > >> should remain part of the Apache Flink project. Larger
>> > > > > organizations
>> > > > > > >> > >> tend to approve the use of and contribution to open
>> source
>> > at
>> > > > the
>> > > > > > >> > >> project level. Sometimes it is everything ASF. More
>> often
>> > it
>> > > is
>> > > > > > >> > >> "Apache Foo". It would be fatal to end up with a
>> patchwork
>> > of
>> > > > > > >> projects
>> > > > > > >> > >> with potentially different licenses and governance to
>> > arrive
>> > > > at a
>> > > > > > >> > >> working Flink setup. This may mean we prioritize
>> usability
>> > > over
>> > > > > > >> > >> developer convenience, if that's in the best interest of
>> > > Flink
>> > > > > as a
>> > > > > > >> > >> whole.
>> > > > > > >> > >>
>> > > > > > >> > >> Thanks,
>> > > > > > >> > >> Thomas
>> > > > > > >> > >>
>> > > > > > >> > >>
>> > > > > > >> > >>
>> > > > > > >> > >> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler <
>> > > > > ches...@apache.org
>> > > > > > >> >
>> > > > > > >> > >> wrote:
>> > > > > > >> > >>> Generally, the issues are reproducibility and control.
>> > > > > > >> > >>>
>> > > > > > >> > >>> Stuffs completely broken on the Flink side for a week?
>> > Well
>> > > > > then so
>> > > > > > >> are
>> > > > > > >> > >>> the connector repos.
>> > > > > > >> > >>> (As-is) You can't go back to a previous version of the
>> > > > snapshot.
>> > > > > > >> Which
>> > > > > > >> > >>> also means that checking out older commits can be
>> > > problematic
>> > > > > > >> because
>> > > > > > >> > >>> you'd still work against the latest snapshots, and they
>> > not
>> > > be
>> > > > > > >> > >>> compatible with each other.
>> > > > > > >> > >>>
>> > > > > > >> > >>>
>> > > > > > >> > >>> On 18/10/2021 15:22, Arvid Heise wrote:
>> > > > > > >> > >>>> I was actually betting on snapshots versions. What are
>> > the
>> > > > > limits?
>> > > > > > >> > >>>> Obviously, we can only do a release of a 1.15
>> connector
>> > > after
>> > > > > 1.15
>> > > > > > >> is
>> > > > > > >> > >>>> release.
>> > > > > > >> > >>>
>> > > > > > >> >
>> > > > > > >> >
>> > > > > > >>
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > >
>> > > > > > > Konstantin Knauf
>> > > > > > >
>> > > > > > > https://twitter.com/snntrable
>> > > > > > >
>> > > > > > > https://github.com/knaufk
>> > > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] Creating an external connector repository

Reply via email to