Re: [DISCUSS] Connector Externalization Retrospective

Ahmed Hamdy Tue, 11 Jun 2024 05:54:28 -0700

Hi Danny,
Thanks for bringing this up, I might haven't driven a connector release
myself but I echo the pain and delay in releases for adding Flink version
support.
I am not really with the mono-repo approach for the following reasons
1- We will lose the flexibility we currently have for connectors (I mean we
even had a major release 4.X for AWS connectors IIRC).
2- We will be undoing the gains for CI decoupling, the frequency of
contributions in connectors like kafka and AWS are unmatched for others
like GCP and RabbitMQ, with new connector added and major feature work I
see unnecessary cost and delays for CI due to monorepo.


I believe the only benefit of this approach (over the other proposed )would
be forcing adopting common connectors changes to all connectors at once
instead of relying on divided efforts to port the change, however for the
reasons mentioned above I still wouldn't vote for this approach.

I am in favor of dropping the version though, I understand it might be
confusing for users but as Sergey mentioned; many times the changes of a
new Flink version don't introduce compatibility issues to connectors so I
believe I believe it might be an easier task than It sound initially.

A question would be what do you think the best approach to when we do
introduce backward compatible changes to connectors API like in this PR[1],
in this case existing connectors would still work with the newly released
Flink version but would rather accumulate technical debt and removing it
would be an Adhoc task for maintainers which I believe is an accepted
tradeoff but would love to hear the feedback.

1-
https://github.com/apache/flink/pull/24180/files#diff-2ffade463560e5941912b91b12a07c888313a4cc7e39ca8700398ed6975b8e90

Best Regards
Ahmed Hamdy


On Tue, 11 Jun 2024 at 08:50, Sergey Nuyanzin <snuyan...@gmail.com> wrote:

> Thanks for starting this discussion Danny
>
> I will put my 5 cents here
>
> From one side yes, support of new Flink release takes time as it was
> mentioned above
> However from another side most of the connectors (main/master branches)
> supported Flink 1.19
> even before it was released, same for 1.20 since they were testing against
> master and supported version branches.
> There are already nightly/weekly jobs (depending on connector)
> running against the latest Flink SNAPSHOTs. And it has already helped to
> catch some blocker issues like[1], [2].
> In fact there are more, I need to spend time retrieving all of them.
>
> I would also not vote for connector mono-repo release since we recently
> just splitted it.
>
> The thing I would suggest:
> since we already have nightly/weekly jobs for connectors testing against
> Flink main repo master branch
> we could add a requirement before the release of Flink itself having these
> job results also green.
>
> [1] https://issues.apache.org/jira/browse/FLINK-34941
> [2] https://issues.apache.org/jira/browse/FLINK-32978#comment-17804459
>
> On Tue, Jun 11, 2024 at 8:24 AM Xintong Song <tonysong...@gmail.com>
> wrote:
>
> > Thanks for bringing this up, Danny. This is indeed an important issue
> that
> > the community needs to improve on.
> >
> > Personally, I think a mono-repo might not be a bad idea, if we apply
> > different rules for the connector releases. To be specific:
> > - flink-connectors 1.19.x contains all connectors that are compatible
> with
> > Flink 1.19.x.
> > - allow not only bug-fixes, but also new features for a third-digit
> release
> > (e.g., flink-connectors 1.19.1)
> >
> > This would allow us to immediately release flink-connectors 1.19.0 right
> > after flink 1.19.0 is out, excluding connectors that are no longer
> > compatible with flink 1.19. Then we can have a couple of flink-connectors
> > 1.19.x releases, gradually adding the missing connectors back. In the
> worst
> > case, this would result in as many releases as having separated connector
> > repose. The benefit comes from 1) there are chances to combine releasing
> of
> > multiple connectors into one release of the mono repo (if they are ready
> > around the same time), and 2) no need to maintain a compatibility matrix
> > and worrying about it being out-of-sync with the code base.
> >
> > However, one thing I don't like about this approach is that it requires
> > combining all the repos we just separated from the main-repo to another
> > mono-repo. That back-and-forth is annoying. So I'm just speaking out my
> > ideas, but would not strongly insist on this.
> >
> > And big +1 for compatibility tools and ci checks.
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Tue, Jun 11, 2024 at 2:38 AM David Radley <david_rad...@uk.ibm.com>
> > wrote:
> >
> > > Hi Danny,
> > > I think your proposal is a good one. This is the approach that we took
> > > with the Egeria project, firstly taking the connectors out of the main
> > > repo, then connectors having their own versions that incremented
> > > organically rather then tied to the core release.
> > >
> > > Blue sky thinking - I wonder if we could :
> > > - have a wizard / utility so the user inputs which Flink level they
> want
> > > and which connectors; the utility knows the compatibility matrix and
> > > downloads the appropriate bundles.
> > > - have the docs interrogate the core and connector repos to check the
> > poms
> > > for the Flink levels and the pr builds to have ?live? docs showing the
> > > supported Flink levels. PyTorch does something like this for it?s docs.
> > >
> > > Kind regards, David.
> > >
> > >
> > >
> > > From: Danny Cranmer <dannycran...@apache.org>
> > > Date: Monday, 10 June 2024 at 17:26
> > > To: dev <dev@flink.apache.org>
> > > Subject: [EXTERNAL] [DISCUSS] Connector Externalization Retrospective
> > > Hello Flink community,
> > >
> > > It has been over 2 years [1] since we started externalizing the Flink
> > > connectors to dedicated repositories from the main Flink code base. The
> > > past discussions can be found here [2]. The community decided to
> > > externalize the connectors to primarily 1/ improve stability and speed
> of
> > > the CI, and 2/ decouple version and release lifecycle to allow the
> > projects
> > > to evolve independently. The outcome of this has resulted in each
> > connector
> > > requiring a dedicated release per Flink minor version, which is a
> burden
> > on
> > > the community. Flink 1.19.0 was released on 2024-03-18 [3], the first
> > > supported connector followed roughly 2.5 months later on 2024-06-06 [4]
> > > (MongoDB). There are still 5 connectors that do not support Flink 1.19
> > [5].
> > >
> > > Two decisions contribute to the high lag between releases. 1/ creating
> > one
> > > repository per connector instead of a single flink-connector mono-repo
> > and
> > > 2/ coupling the Flink version to the connector version [6]. A single
> > > connector repository would reduce the number of connector releases
> from N
> > > to 1, but would couple the connector CI and reduce release flexibility.
> > > Decoupling the connector versions from Flink would eliminate the need
> to
> > > release each connector for each new Flink minor version, but we would
> > need
> > > a new compatibility mechanism.
> > >
> > > I propose that from each next connector release we drop the coupling on
> > the
> > > Flink version. For example, instead of 3.4.0-1.20 (<connector>.<flink>)
> > we
> > > would release 3.4.0 (<connector>). We can model a compatibility matrix
> > > within the Flink docs to help users pick the correct versions. This
> would
> > > mean we would usually not need to release a new connector version per
> > Flink
> > > version, assuming there are no breaking changes. Worst case, if
> breaking
> > > changes impact all connectors we would still need to release all
> > > connectors. However, for Flink 1.17 and 1.18 there were only a handful
> of
> > > issues (breaking changes), and mostly impacting tests. We could decide
> to
> > > align this with Flink 2.0, however I see no compelling reason to do so.
> > > This was discussed previously [2] as a long term goal once the
> connector
> > > APIs are stable. But I think the current compatibility rules support
> this
> > > change now.
> > >
> > > I would prefer to not create a connector mono-repo. Separate repos
> gives
> > > each connector more flexibility to evolve independently, and removing
> > > unnecessary releases will significantly reduce the release effort.
> > >
> > > I would like to hear opinions and ideas from the community. In
> > particular,
> > > are there any other issues you have observed that we should consider
> > > addressing?
> > >
> > > Thanks,
> > > Danny.
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/flink-connector-elasticsearch/commit/3ca2e625e3149e8864a4ad478773ab4a82720241
> > > [2] https://lists.apache.org/thread/8k1xonqt7hn0xldbky1cxfx3fzh6sj7h
> > > [3]
> > >
> > >
> >
> https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19/
> > > [4] https://flink.apache.org/downloads/#apache-flink-connectors-1
> > > [5] https://issues.apache.org/jira/browse/FLINK-35131
> > > [6]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development#ExternalizedConnectordevelopment-Examples
> > >
> > > Unless otherwise stated above:
> > >
> > > IBM United Kingdom Limited
> > > Registered in England and Wales with number 741598
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> > >
> >
>
>
> --
> Best regards,
> Sergey
>

Re: [DISCUSS] Connector Externalization Retrospective

Reply via email to