Hi Chesnay,

>> The wrapping of connectors is a bit of a maintenance nightmare and
doesn't really work with external/custom connectors.

Cannot agree with you more.

>> Has there ever been thoughts about changing flink-pythons connector
setup to use the table api connectors underneath?

I'm still not sure if this is feasible for all connectors, however,
this may be a good idea. The concern is that the DataStream API
connectors functionalities may be unaligned between Java and Python
connectors. Besides, there are still a few connectors which only have
DataStream API connectors, e.g. Google PubSub, RabbitMQ, Cassandra,
Pulsar, Hybrid Source, etc. Besides, it currently already supports
Table API connectors in PyFlink and if we take this way, maybe we
could just tell users to use Table API connector directly.

Another option in my head before is to provide an API which allows
configuring the behavior via key/value pairs in both the Java & Python
DataStream API connectors.

Regards,
Dian

On Wed, Jul 5, 2023 at 6:34 PM Chesnay Schepler <ches...@apache.org> wrote:
>
> Has there ever been thoughts about changing flink-pythons connector
> setup to use the table api connectors underneath?
>
> The wrapping of connectors is a bit of a maintenance nightmare and
> doesn't really work with external/custom connectors.
>
> On 04/07/2023 13:35, Dian Fu wrote:
> > Thanks Ran Tao for proposing this discussion and Martijn for sharing
> > the thought.
> >
> >>   While flink-python now fails the CI, it shouldn't actually depend on the
> > externalized connectors. I'm not sure what PyFlink does with it, but if
> > belongs to the connector code,
> >
> > For each DataStream connector, there is a corresponding Python wrapper
> > and also some test cases in PyFlink. In theory, we should move that
> > wrapper into each connector repository. In the past, we have not done
> > that when externalizing the connectors since it may introduce some
> > burden when releasing since it means that we have to publish each
> > connector to PyPI separately.
> >
> > To resolve this problem, I guess we can move the connector support in
> > PyFlink into the external connector repository.
> >
> > Regards,
> > Dian
> >
> >
> > On Mon, Jul 3, 2023 at 11:08 PM Ran Tao <chucheng...@gmail.com> wrote:
> >> @Martijn
> >> thanks for clear explanations.
> >>
> >> If we follow the line you specified (Connectors shouldn't rely on
> >> dependencies that may or may not be
> >> available in Flink itself)
> >> It seems that we should add a certain dependency if we need(such as
> >> commons-io, commons-collection) in connector pom explicitly.
> >> And bundle it in sql-connector uber jar.
> >>
> >> Then there is only one thing left that we need to make flink-python test
> >> not depend on the released flink-connector.
> >> Maybe we should check it out and decouple it like you suggested.
> >>
> >> Best Regards,
> >> Ran Tao
> >> https://github.com/chucheng92
> >>
> >>
> >> Martijn Visser <martijnvis...@apache.org> 于2023年7月3日周一 22:06写道:
> >>
> >>> Hi Ran Tao,
> >>>
> >>> Thanks for opening this topic. I think there's a couple of things at hand:
> >>> 1. Connectors shouldn't rely on dependencies that may or may not be
> >>> available in Flink itself, like we've seen with flink-shaded. That avoids 
> >>> a
> >>> tight coupling between Flink and connectors, which is exactly what we try
> >>> to avoid.
> >>> 2. When following that line, that would also be applicable for things like
> >>> commons-collections and commons-io. If a connector wants to use them, it
> >>> should make sure that it bundles those artifacts itself.
> >>> 3. While flink-python now fails the CI, it shouldn't actually depend on 
> >>> the
> >>> externalized connectors. I'm not sure what PyFlink does with it, but if
> >>> belongs to the connector code, that code should also be moved to the
> >>> individual connector repo. If it's just a generic test, we could consider
> >>> creating a generic test against released connector versions to determine
> >>> compatibility.
> >>>
> >>> I'm curious about the opinions of others as well.
> >>>
> >>> Best regards,
> >>>
> >>> Martijn
> >>>
> >>> On Mon, Jul 3, 2023 at 3:37 PM Ran Tao <chucheng...@gmail.com> wrote:
> >>>
> >>>> I have an issue here that needs to upgrade commons-collections[1] (this
> >>> is
> >>>> an example), but PR ci fails because flink-python test cases depend on
> >>>> flink-sql-connector-kafka, but kafka-sql-connector is a small jar, does
> >>> not
> >>>> include this dependency, so flink ci cause exception[2]. Current my
> >>>> solution is [3]. But even if this PR is done, the upgrade of flink still
> >>>> requires kafka-connector released.
> >>>>
> >>>> This issue leads to deeper problems. Although the connectors have been
> >>>> externalized, many UTs of flink-python depend on these connectors, and a
> >>>> basic agreement of externalized connectors is that other dependencies
> >>>> cannot be introduced explicitly, which means the externalized connectors
> >>>> use dependencies inherited from flink. In this way, when flink main
> >>>> upgrades some dependencies, it is easy to fail when executing
> >>> flink-python
> >>>> test cases,because flink no longer has this class, and the connector does
> >>>> not contain it. It's circular problem.
> >>>>
> >>>> Unless, the connector self-consistently includes all dependencies, which
> >>> is
> >>>> uncontrollable.
> >>>> (only a few connectors include all jars in shade phase)
> >>>>
> >>>> In short, the current flink-python module's dependencies on the connector
> >>>> leads to an incomplete process of externalization and decoupling, which
> >>>> will lead to circular dependencies when flink upgrade or change some
> >>>> dependencies.
> >>>>
> >>>> I don't know if I made it clear. I hope to get everyone's opinions on
> >>> what
> >>>> better solutions we should adopt for similar problems in the future.
> >>>>
> >>>> [1] https://issues.apache.org/jira/browse/FLINK-30274
> >>>> [2]
> >>>>
> >>>>
> >>> https://user-images.githubusercontent.com/11287509/250120404-d12b60f4-7ff3-457e-a2c4-8cd415bb5ca2.png
> >>>>
> >>>>
> >>> https://user-images.githubusercontent.com/11287509/250120522-6b096a4f-83f0-4287-b7ad-d46b9371de4c.png
> >>>> [3] https://github.com/apache/flink-connector-kafka/pull/38
> >>>>
> >>>> Best Regards,
> >>>> Ran Tao
> >>>> https://github.com/chucheng92
> >>>>
>

Reply via email to