Re: [DISCUSS] Creating an external connector repository

Arvid Heise Tue, 26 Oct 2021 01:31:40 -0700

Hi folks,

I think some questions came up and I'd like to address the question of the
timing.


Could you clarify what release cadence you're thinking of? There's quite
> a big range that fits "more frequent than Flink" (per-commit, daily,
> weekly, bi-weekly, monthly, even bi-monthly).

The short answer is: as often as needed:
- If there is a CVE in a dependency and we need to bump it - release
immediately.
- If there is a new feature merged, release soonish. We may collect a few
successive features before a release.
- If there is a bugfix, release immediately or soonish depending on the
severity and if there are workarounds available.

We should not limit ourselves; the whole idea of independent releases is
exactly that you release as needed. There is no release planning or
anything needed, you just go with a release as if it was an external
artifact.

(1) is the connector API already stable?
> From another discussion thread [1], connector API is far from stable.
> Currently, it's hard to build connectors against multiple Flink versions.
> There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14 and
>  maybe also in the future versions,  because Table related APIs are still
> @PublicEvolving and new Sink API is still @Experimental.
>

The question is: what is stable in an evolving system? We recently
discovered that the old SourceFunction needed to be refined such that
cancellation works correctly [1]. So that interface is in Flink since 7
years, heavily used also outside, and we still had to change the contract
in a way that I'd expect any implementer to recheck their implementation.
It might not be necessary to change anything and you can probably change
the the code for all Flink versions but still, the interface was not stable
in the closest sense.

If we focus just on API changes on the unified interfaces, then we expect
one more change to Sink API to support compaction. For Table API, there
will most likely also be some changes in 1.15. So we could wait for 1.15.

But I'm questioning if that's really necessary because we will add more
functionality beyond 1.15 without breaking API. For example, we may add
more unified connector metrics. If you want to use it in your connector,
you have to support multiple Flink versions anyhow. So rather then focusing
the discussion on "when is stuff stable", I'd rather focus on "how can we
support building connectors against multiple Flink versions" and make it as
painless as possible.

Chesnay pointed out to use different branches for different Flink versions
which sounds like a good suggestion. With a mono-repo, we can't use
branches differently anyways (there is no way to have release branches per
connector without chaos). In these branches, we could provide shims to
simulate future features in older Flink versions such that code-wise, the
source code of a specific connector may not diverge (much). For example, to
register unified connector metrics, we could simulate the current approach
also in some utility package of the mono-repo.

I see the stable core Flink API as a prerequisite for modularity. And
> for connectors it is not just the source and sink API (source being
> stable as of 1.14), but everything that is required to build and
> maintain a connector downstream, such as the test utilities and
> infrastructure.
>

That is a very fair point. I'm actually surprised to see that
MiniClusterWithClientResource is not public. I see it being used in all
connectors, especially outside of Flink. I fear that as long as we do not
have connectors outside, we will not properly annotate and maintain these
utilties in a classic hen-and-egg-problem. I will outline an idea at the
end.

> the connectors need to be adopted and require at least one release per
> Flink minor release.
> However, this will make the releases of connectors slower, e.g. maintain
> features for multiple branches and release multiple branches.
> I think the main purpose of having an external connector repository is in
> order to have "faster releases of connectors"?
>

> Imagine a project with a complex set of dependencies. Let's say Flink
> version A plus Flink reliant dependencies released by other projects
> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We don't want a
> situation where we bump the core Flink version to B and things fall
> apart (interface changes, utilities that were useful but not public,
> transitive dependencies etc.).
>

Yes, that's why I wanted to automate the processes more which is not that
easy under ASF. Maybe we automate the source provision across supported
versions and have 1 vote thread for all versions of a connector?

>From the perspective of CDC connector maintainers, the biggest advantage of
> maintaining it outside of the Flink project is that:
> 1) we can have a more flexible and faster release cycle
> 2) we can be more liberal with committership for connector maintainers
> which can also attract more committers to help the release.
>
> Personally, I think maintaining one connector repository under the ASF may
> not have the above benefits.
>

Yes, I also feel that ASF is too restrictive for our needs. But it feels
like there are too many that see it differently and I think we need

(2) Flink testability without connectors.
> This is a very good question. How can we guarantee the new Source and Sink
> API are stable with only test implementation?
>

We can't and shouldn't. Since the connector repo is managed by Flink, a
Flink release manager needs to check if the Flink connectors are actually
working prior to creating an RC. That's similar to how flink-shaded and
flink core are related.


So here is one idea that I had to get things rolling. We are going to
address the external repo iteratively without compromising what we already
have:
1.Phase, add new contributions to external repo. We use that time to setup
infra accordingly and optimize release processes. We will identify test
utilities that are not yet public/stable and fix that.
2.Phase, add ports to the new unified interfaces of existing connectors.
That requires a previous Flink release to make utilities stable. Keep old
interfaces in flink-core.
3.Phase, remove old interfaces in flink-core of some connectors (tbd at a
later point).
4.Phase, optionally move all remaining connectors (tbd at a later point).

I'd envision having ~3 months between the starting the different phases.
WDYT?


[1] https://issues.apache.org/jira/browse/FLINK-23527

On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson <k...@tabular.io> wrote:

> Hi all,
>
> My name is Kyle and I’m an open source developer primarily focused on
> Apache Iceberg.
>
> I’m happy to help clarify or elaborate on any aspect of our experience
> working on a relatively decoupled connector that is downstream and pretty
> popular.
>
> I’d also love to be able to contribute or assist in any way I can.
>
> I don’t mean to thread jack, but are there any meetings or community sync
> ups, specifically around the connector APIs, that I might join / be invited
> to?
>
> I did want to add that even though I’ve experienced some of the pain points
> of integrating with an evolving system / API (catalog support is generally
> speaking pretty new everywhere really in this space), I also agree
> personally that you shouldn’t slow down development velocity too much for
> the sake of external connector. Getting to a performant and stable place
> should be the primary goal, and slowing that down to support stragglers
> will (in my personal opinion) always be a losing game. Some folks will
> simply stay behind on versions regardless until they have to upgrade.
>
> I am working on ensuring that the Iceberg community stays within 1-2
> versions of Flink, so that we can help provide more feedback or contribute
> things that might make our ability to support multiple Flink runtimes /
> versions with one project / codebase and minimal to no reflection (our
> desired goal).
>
> If there’s anything I can do or any way I can be of assistance, please
> don’t hesitate to reach out. Or find me on ASF slack 😀
>
> I greatly appreciate your general concern for the needs of downstream
> connector integrators!
>
> Cheers
> Kyle Bendickson (GitHub: kbendick)
> Open Source Developer
> kyle [at] tabular [dot] io
>
> On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <t...@apache.org> wrote:
>
> > Hi,
> >
> > I see the stable core Flink API as a prerequisite for modularity. And
> > for connectors it is not just the source and sink API (source being
> > stable as of 1.14), but everything that is required to build and
> > maintain a connector downstream, such as the test utilities and
> > infrastructure.
> >
> > Without the stable surface of core Flink, changes will leak into
> > downstream dependencies and force lock step updates. Refactoring
> > across N repos is more painful than a single repo. Those with
> > experience developing downstream of Flink will know the pain, and that
> > isn't limited to connectors. I don't remember a Flink "minor version"
> > update that was just a dependency version change and did not force
> > other downstream changes.
> >
> > Imagine a project with a complex set of dependencies. Let's say Flink
> > version A plus Flink reliant dependencies released by other projects
> > (Flink-external connectors, Beam, Iceberg, Hudi, ..). We don't want a
> > situation where we bump the core Flink version to B and things fall
> > apart (interface changes, utilities that were useful but not public,
> > transitive dependencies etc.).
> >
> > The discussion here also highlights the benefits of keeping certain
> > connectors outside Flink. Whether that is due to difference in
> > developer community, maturity of the connectors, their
> > specialized/limited usage etc. I would like to see that as a sign of a
> > growing ecosystem and most of the ideas that Arvid has put forward
> > would benefit further growth of the connector ecosystem.
> >
> > As for keeping connectors within Apache Flink: I prefer that as the
> > path forward for "essential" connectors like FileSource, KafkaSource,
> > ... And we can still achieve a more flexible and faster release cycle.
> >
> > Thanks,
> > Thomas
> >
> >
> >
> >
> >
> > On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <imj...@gmail.com> wrote:
> > >
> > > Hi Konstantin,
> > >
> > > > the connectors need to be adopted and require at least one release
> per
> > > Flink minor release.
> > > However, this will make the releases of connectors slower, e.g.
> maintain
> > > features for multiple branches and release multiple branches.
> > > I think the main purpose of having an external connector repository is
> in
> > > order to have "faster releases of connectors"?
> > >
> > >
> > > From the perspective of CDC connector maintainers, the biggest
> advantage
> > of
> > > maintaining it outside of the Flink project is that:
> > > 1) we can have a more flexible and faster release cycle
> > > 2) we can be more liberal with committership for connector maintainers
> > > which can also attract more committers to help the release.
> > >
> > > Personally, I think maintaining one connector repository under the ASF
> > may
> > > not have the above benefits.
> > >
> > > Best,
> > > Jark
> > >
> > > On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf <kna...@apache.org>
> > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > regarding the stability of the APIs. I think everyone agrees that
> > > > connector APIs which are stable across minor versions (1.13->1.14)
> are
> > the
> > > > mid-term goal. But:
> > > >
> > > > a) These APIs are still quite young, and we shouldn't make them
> @Public
> > > > prematurely either.
> > > >
> > > > b) Isn't this *mostly* orthogonal to where the connector code lives?
> > Yes,
> > > > as long as there are breaking changes, the connectors need to be
> > adopted
> > > > and require at least one release per Flink minor release.
> > > > Documentation-wise this can be addressed via a compatibility matrix
> for
> > > > each connector as Arvid suggested. IMO we shouldn't block this effort
> > on
> > > > the stability of the APIs.
> > > >
> > > > Cheers,
> > > >
> > > > Konstantin
> > > >
> > > >
> > > >
> > > > On Wed, Oct 20, 2021 at 8:56 AM Jark Wu <imj...@gmail.com> wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I think Thomas raised very good questions and would like to know
> your
> > > >> opinions if we want to move connectors out of flink in this version.
> > > >>
> > > >> (1) is the connector API already stable?
> > > >> > Separate releases would only make sense if the core Flink surface
> is
> > > >> > fairly stable though. As evident from Iceberg (and also Beam),
> > that's
> > > >> > not the case currently. We should probably focus on addressing the
> > > >> > stability first, before splitting code. A success criteria could
> be
> > > >> > that we are able to build Iceberg and Beam against multiple Flink
> > > >> > versions w/o the need to change code. The goal would be that no
> > > >> > connector breaks when we make changes to Flink core. Until that's
> > the
> > > >> > case, code separation creates a setup where 1+1 or N+1
> repositories
> > > >> > need to move lock step.
> > > >>
> > > >> From another discussion thread [1], connector API is far from
> stable.
> > > >> Currently, it's hard to build connectors against multiple Flink
> > versions.
> > > >> There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14
> > and
> > > >>  maybe also in the future versions,  because Table related APIs are
> > still
> > > >> @PublicEvolving and new Sink API is still @Experimental.
> > > >>
> > > >>
> > > >> (2) Flink testability without connectors.
> > > >> > Flink w/o Kafka connector (and few others) isn't
> > > >> > viable. Testability of Flink was already brought up, can we really
> > > >> > certify a Flink core release without Kafka connector? Maybe those
> > > >> > connectors that are used in Flink e2e tests to validate
> > functionality
> > > >> > of core Flink should not be broken out?
> > > >>
> > > >> This is a very good question. How can we guarantee the new Source
> and
> > Sink
> > > >> API are stable with only test implementation?
> > > >>
> > > >>
> > > >> Best,
> > > >> Jark
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler <ches...@apache.org>
> > > >> wrote:
> > > >>
> > > >> > Could you clarify what release cadence you're thinking of? There's
> > quite
> > > >> > a big range that fits "more frequent than Flink" (per-commit,
> daily,
> > > >> > weekly, bi-weekly, monthly, even bi-monthly).
> > > >> >
> > > >> > On 19/10/2021 14:15, Martijn Visser wrote:
> > > >> > > Hi all,
> > > >> > >
> > > >> > > I think it would be a huge benefit if we can achieve more
> frequent
> > > >> > releases
> > > >> > > of connectors, which are not bound to the release cycle of Flink
> > > >> itself.
> > > >> > I
> > > >> > > agree that in order to get there, we need to have stable
> > interfaces
> > > >> which
> > > >> > > are trustworthy and reliable, so they can be safely used by
> those
> > > >> > > connectors. I do think that work still needs to be done on those
> > > >> > > interfaces, but I am confident that we can get there from a
> Flink
> > > >> > > perspective.
> > > >> > >
> > > >> > > I am worried that we would not be able to achieve those frequent
> > > >> releases
> > > >> > > of connectors if we are putting these connectors under the
> Apache
> > > >> > umbrella,
> > > >> > > because that means that for each connector release we have to
> > follow
> > > >> the
> > > >> > > Apache release creation process. This requires a lot of manual
> > steps
> > > >> and
> > > >> > > prohibits automation and I think it would be hard to scale out
> > > >> frequent
> > > >> > > releases of connectors. I'm curious how others think this
> > challenge
> > > >> could
> > > >> > > be solved.
> > > >> > >
> > > >> > > Best regards,
> > > >> > >
> > > >> > > Martijn
> > > >> > >
> > > >> > > On Mon, 18 Oct 2021 at 22:22, Thomas Weise <t...@apache.org>
> > wrote:
> > > >> > >
> > > >> > >> Thanks for initiating this discussion.
> > > >> > >>
> > > >> > >> There are definitely a few things that are not optimal with our
> > > >> > >> current management of connectors. I would not necessarily
> > > >> characterize
> > > >> > >> it as a "mess" though. As the points raised so far show, it
> isn't
> > > >> easy
> > > >> > >> to find a solution that balances competing requirements and
> > leads to
> > > >> a
> > > >> > >> net improvement.
> > > >> > >>
> > > >> > >> It would be great if we can find a setup that allows for
> > connectors
> > > >> to
> > > >> > >> be released independently of core Flink and that each connector
> > can
> > > >> be
> > > >> > >> released separately. Flink already has separate releases
> > > >> > >> (flink-shaded), so that by itself isn't a new thing.
> > Per-connector
> > > >> > >> releases would need to allow for more frequent releases
> (without
> > the
> > > >> > >> baggage that a full Flink release comes with).
> > > >> > >>
> > > >> > >> Separate releases would only make sense if the core Flink
> > surface is
> > > >> > >> fairly stable though. As evident from Iceberg (and also Beam),
> > that's
> > > >> > >> not the case currently. We should probably focus on addressing
> > the
> > > >> > >> stability first, before splitting code. A success criteria
> could
> > be
> > > >> > >> that we are able to build Iceberg and Beam against multiple
> Flink
> > > >> > >> versions w/o the need to change code. The goal would be that no
> > > >> > >> connector breaks when we make changes to Flink core. Until
> > that's the
> > > >> > >> case, code separation creates a setup where 1+1 or N+1
> > repositories
> > > >> > >> need to move lock step.
> > > >> > >>
> > > >> > >> Regarding some connectors being more important for Flink than
> > others:
> > > >> > >> That's a fact. Flink w/o Kafka connector (and few others) isn't
> > > >> > >> viable. Testability of Flink was already brought up, can we
> > really
> > > >> > >> certify a Flink core release without Kafka connector? Maybe
> those
> > > >> > >> connectors that are used in Flink e2e tests to validate
> > functionality
> > > >> > >> of core Flink should not be broken out?
> > > >> > >>
> > > >> > >> Finally, I think that the connectors that move into separate
> > repos
> > > >> > >> should remain part of the Apache Flink project. Larger
> > organizations
> > > >> > >> tend to approve the use of and contribution to open source at
> the
> > > >> > >> project level. Sometimes it is everything ASF. More often it is
> > > >> > >> "Apache Foo". It would be fatal to end up with a patchwork of
> > > >> projects
> > > >> > >> with potentially different licenses and governance to arrive
> at a
> > > >> > >> working Flink setup. This may mean we prioritize usability over
> > > >> > >> developer convenience, if that's in the best interest of Flink
> > as a
> > > >> > >> whole.
> > > >> > >>
> > > >> > >> Thanks,
> > > >> > >> Thomas
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler <
> > ches...@apache.org
> > > >> >
> > > >> > >> wrote:
> > > >> > >>> Generally, the issues are reproducibility and control.
> > > >> > >>>
> > > >> > >>> Stuffs completely broken on the Flink side for a week? Well
> > then so
> > > >> are
> > > >> > >>> the connector repos.
> > > >> > >>> (As-is) You can't go back to a previous version of the
> snapshot.
> > > >> Which
> > > >> > >>> also means that checking out older commits can be problematic
> > > >> because
> > > >> > >>> you'd still work against the latest snapshots, and they not be
> > > >> > >>> compatible with each other.
> > > >> > >>>
> > > >> > >>>
> > > >> > >>> On 18/10/2021 15:22, Arvid Heise wrote:
> > > >> > >>>> I was actually betting on snapshots versions. What are the
> > limits?
> > > >> > >>>> Obviously, we can only do a release of a 1.15 connector after
> > 1.15
> > > >> is
> > > >> > >>>> release.
> > > >> > >>>
> > > >> >
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > >
> > > > Konstantin Knauf
> > > >
> > > > https://twitter.com/snntrable
> > > >
> > > > https://github.com/knaufk
> > > >
> >
>

Re: [DISCUSS] Creating an external connector repository

Reply via email to