Re: [DISCUSS] Flink and Externalized connectors leads to block and circular dependency problems

Jing Ge Fri, 28 Jul 2023 10:26:07 -0700

Hi Ran Tao,

What is the current status? @Dian There were many options. Which one is the
most feasible one you prefer?


Best regards,
Jing

On Fri, Jul 7, 2023 at 2:37 PM Mason Chen <[email protected]> wrote:

> Hi all,
>
> I also agree with what's been said above.
>
> +1, I think the Table API delegation is a good suggestion--it essentially
> allows a connector to get Python support for free. We've seen that
> Table/SQL and Python APIs complement each other well and are ideal for data
> scientists. With respect to unaligned functionalities, I think that also
> holds true for other APIs, e.g. Datastream and Table/SQL since there is
> functionality that is not natural to represent as a configuration/SQL.
>
> Best,
> Mason
>
> On Wed, Jul 5, 2023 at 10:14 PM Dian Fu <[email protected]> wrote:
>
> > Hi Chesnay,
> >
> > >> The wrapping of connectors is a bit of a maintenance nightmare and
> > doesn't really work with external/custom connectors.
> >
> > Cannot agree with you more.
> >
> > >> Has there ever been thoughts about changing flink-pythons connector
> > setup to use the table api connectors underneath?
> >
> > I'm still not sure if this is feasible for all connectors, however,
> > this may be a good idea. The concern is that the DataStream API
> > connectors functionalities may be unaligned between Java and Python
> > connectors. Besides, there are still a few connectors which only have
> > DataStream API connectors, e.g. Google PubSub, RabbitMQ, Cassandra,
> > Pulsar, Hybrid Source, etc. Besides, it currently already supports
> > Table API connectors in PyFlink and if we take this way, maybe we
> > could just tell users to use Table API connector directly.
> >
> > Another option in my head before is to provide an API which allows
> > configuring the behavior via key/value pairs in both the Java & Python
> > DataStream API connectors.
> >
> > Regards,
> > Dian
> >
> > On Wed, Jul 5, 2023 at 6:34 PM Chesnay Schepler <[email protected]>
> > wrote:
> > >
> > > Has there ever been thoughts about changing flink-pythons connector
> > > setup to use the table api connectors underneath?
> > >
> > > The wrapping of connectors is a bit of a maintenance nightmare and
> > > doesn't really work with external/custom connectors.
> > >
> > > On 04/07/2023 13:35, Dian Fu wrote:
> > > > Thanks Ran Tao for proposing this discussion and Martijn for sharing
> > > > the thought.
> > > >
> > > >>   While flink-python now fails the CI, it shouldn't actually depend
> > on the
> > > > externalized connectors. I'm not sure what PyFlink does with it, but
> if
> > > > belongs to the connector code,
> > > >
> > > > For each DataStream connector, there is a corresponding Python
> wrapper
> > > > and also some test cases in PyFlink. In theory, we should move that
> > > > wrapper into each connector repository. In the past, we have not done
> > > > that when externalizing the connectors since it may introduce some
> > > > burden when releasing since it means that we have to publish each
> > > > connector to PyPI separately.
> > > >
> > > > To resolve this problem, I guess we can move the connector support in
> > > > PyFlink into the external connector repository.
> > > >
> > > > Regards,
> > > > Dian
> > > >
> > > >
> > > > On Mon, Jul 3, 2023 at 11:08 PM Ran Tao <[email protected]>
> wrote:
> > > >> @Martijn
> > > >> thanks for clear explanations.
> > > >>
> > > >> If we follow the line you specified (Connectors shouldn't rely on
> > > >> dependencies that may or may not be
> > > >> available in Flink itself)
> > > >> It seems that we should add a certain dependency if we need(such as
> > > >> commons-io, commons-collection) in connector pom explicitly.
> > > >> And bundle it in sql-connector uber jar.
> > > >>
> > > >> Then there is only one thing left that we need to make flink-python
> > test
> > > >> not depend on the released flink-connector.
> > > >> Maybe we should check it out and decouple it like you suggested.
> > > >>
> > > >> Best Regards,
> > > >> Ran Tao
> > > >> https://github.com/chucheng92
> > > >>
> > > >>
> > > >> Martijn Visser <[email protected]> 于2023年7月3日周一 22:06写道：
> > > >>
> > > >>> Hi Ran Tao,
> > > >>>
> > > >>> Thanks for opening this topic. I think there's a couple of things
> at
> > hand:
> > > >>> 1. Connectors shouldn't rely on dependencies that may or may not be
> > > >>> available in Flink itself, like we've seen with flink-shaded. That
> > avoids a
> > > >>> tight coupling between Flink and connectors, which is exactly what
> > we try
> > > >>> to avoid.
> > > >>> 2. When following that line, that would also be applicable for
> > things like
> > > >>> commons-collections and commons-io. If a connector wants to use
> > them, it
> > > >>> should make sure that it bundles those artifacts itself.
> > > >>> 3. While flink-python now fails the CI, it shouldn't actually
> depend
> > on the
> > > >>> externalized connectors. I'm not sure what PyFlink does with it,
> but
> > if
> > > >>> belongs to the connector code, that code should also be moved to
> the
> > > >>> individual connector repo. If it's just a generic test, we could
> > consider
> > > >>> creating a generic test against released connector versions to
> > determine
> > > >>> compatibility.
> > > >>>
> > > >>> I'm curious about the opinions of others as well.
> > > >>>
> > > >>> Best regards,
> > > >>>
> > > >>> Martijn
> > > >>>
> > > >>> On Mon, Jul 3, 2023 at 3:37 PM Ran Tao <[email protected]>
> > wrote:
> > > >>>
> > > >>>> I have an issue here that needs to upgrade commons-collections[1]
> > (this
> > > >>> is
> > > >>>> an example), but PR ci fails because flink-python test cases
> depend
> > on
> > > >>>> flink-sql-connector-kafka, but kafka-sql-connector is a small jar,
> > does
> > > >>> not
> > > >>>> include this dependency, so flink ci cause exception[2]. Current
> my
> > > >>>> solution is [3]. But even if this PR is done, the upgrade of flink
> > still
> > > >>>> requires kafka-connector released.
> > > >>>>
> > > >>>> This issue leads to deeper problems. Although the connectors have
> > been
> > > >>>> externalized, many UTs of flink-python depend on these connectors,
> > and a
> > > >>>> basic agreement of externalized connectors is that other
> > dependencies
> > > >>>> cannot be introduced explicitly, which means the externalized
> > connectors
> > > >>>> use dependencies inherited from flink. In this way, when flink
> main
> > > >>>> upgrades some dependencies, it is easy to fail when executing
> > > >>> flink-python
> > > >>>> test cases，because flink no longer has this class, and the
> > connector does
> > > >>>> not contain it. It's circular problem.
> > > >>>>
> > > >>>> Unless, the connector self-consistently includes all dependencies,
> > which
> > > >>> is
> > > >>>> uncontrollable.
> > > >>>> (only a few connectors include all jars in shade phase)
> > > >>>>
> > > >>>> In short, the current flink-python module's dependencies on the
> > connector
> > > >>>> leads to an incomplete process of externalization and decoupling,
> > which
> > > >>>> will lead to circular dependencies when flink upgrade or change
> some
> > > >>>> dependencies.
> > > >>>>
> > > >>>> I don't know if I made it clear. I hope to get everyone's opinions
> > on
> > > >>> what
> > > >>>> better solutions we should adopt for similar problems in the
> future.
> > > >>>>
> > > >>>> [1] https://issues.apache.org/jira/browse/FLINK-30274
> > > >>>> [2]
> > > >>>>
> > > >>>>
> > > >>>
> >
> https://user-images.githubusercontent.com/11287509/250120404-d12b60f4-7ff3-457e-a2c4-8cd415bb5ca2.png
> > > >>>>
> > > >>>>
> > > >>>
> >
> https://user-images.githubusercontent.com/11287509/250120522-6b096a4f-83f0-4287-b7ad-d46b9371de4c.png
> > > >>>> [3] https://github.com/apache/flink-connector-kafka/pull/38
> > > >>>>
> > > >>>> Best Regards,
> > > >>>> Ran Tao
> > > >>>> https://github.com/chucheng92
> > > >>>>
> > >
> >
>

Re: [DISCUSS] Flink and Externalized connectors leads to block and circular dependency problems

Reply via email to