Hi all, I also agree with what's been said above.
+1, I think the Table API delegation is a good suggestion--it essentially allows a connector to get Python support for free. We've seen that Table/SQL and Python APIs complement each other well and are ideal for data scientists. With respect to unaligned functionalities, I think that also holds true for other APIs, e.g. Datastream and Table/SQL since there is functionality that is not natural to represent as a configuration/SQL. Best, Mason On Wed, Jul 5, 2023 at 10:14 PM Dian Fu <dian0511...@gmail.com> wrote: > Hi Chesnay, > > >> The wrapping of connectors is a bit of a maintenance nightmare and > doesn't really work with external/custom connectors. > > Cannot agree with you more. > > >> Has there ever been thoughts about changing flink-pythons connector > setup to use the table api connectors underneath? > > I'm still not sure if this is feasible for all connectors, however, > this may be a good idea. The concern is that the DataStream API > connectors functionalities may be unaligned between Java and Python > connectors. Besides, there are still a few connectors which only have > DataStream API connectors, e.g. Google PubSub, RabbitMQ, Cassandra, > Pulsar, Hybrid Source, etc. Besides, it currently already supports > Table API connectors in PyFlink and if we take this way, maybe we > could just tell users to use Table API connector directly. > > Another option in my head before is to provide an API which allows > configuring the behavior via key/value pairs in both the Java & Python > DataStream API connectors. > > Regards, > Dian > > On Wed, Jul 5, 2023 at 6:34 PM Chesnay Schepler <ches...@apache.org> > wrote: > > > > Has there ever been thoughts about changing flink-pythons connector > > setup to use the table api connectors underneath? > > > > The wrapping of connectors is a bit of a maintenance nightmare and > > doesn't really work with external/custom connectors. > > > > On 04/07/2023 13:35, Dian Fu wrote: > > > Thanks Ran Tao for proposing this discussion and Martijn for sharing > > > the thought. > > > > > >> While flink-python now fails the CI, it shouldn't actually depend > on the > > > externalized connectors. I'm not sure what PyFlink does with it, but if > > > belongs to the connector code, > > > > > > For each DataStream connector, there is a corresponding Python wrapper > > > and also some test cases in PyFlink. In theory, we should move that > > > wrapper into each connector repository. In the past, we have not done > > > that when externalizing the connectors since it may introduce some > > > burden when releasing since it means that we have to publish each > > > connector to PyPI separately. > > > > > > To resolve this problem, I guess we can move the connector support in > > > PyFlink into the external connector repository. > > > > > > Regards, > > > Dian > > > > > > > > > On Mon, Jul 3, 2023 at 11:08 PM Ran Tao <chucheng...@gmail.com> wrote: > > >> @Martijn > > >> thanks for clear explanations. > > >> > > >> If we follow the line you specified (Connectors shouldn't rely on > > >> dependencies that may or may not be > > >> available in Flink itself) > > >> It seems that we should add a certain dependency if we need(such as > > >> commons-io, commons-collection) in connector pom explicitly. > > >> And bundle it in sql-connector uber jar. > > >> > > >> Then there is only one thing left that we need to make flink-python > test > > >> not depend on the released flink-connector. > > >> Maybe we should check it out and decouple it like you suggested. > > >> > > >> Best Regards, > > >> Ran Tao > > >> https://github.com/chucheng92 > > >> > > >> > > >> Martijn Visser <martijnvis...@apache.org> 于2023年7月3日周一 22:06写道: > > >> > > >>> Hi Ran Tao, > > >>> > > >>> Thanks for opening this topic. I think there's a couple of things at > hand: > > >>> 1. Connectors shouldn't rely on dependencies that may or may not be > > >>> available in Flink itself, like we've seen with flink-shaded. That > avoids a > > >>> tight coupling between Flink and connectors, which is exactly what > we try > > >>> to avoid. > > >>> 2. When following that line, that would also be applicable for > things like > > >>> commons-collections and commons-io. If a connector wants to use > them, it > > >>> should make sure that it bundles those artifacts itself. > > >>> 3. While flink-python now fails the CI, it shouldn't actually depend > on the > > >>> externalized connectors. I'm not sure what PyFlink does with it, but > if > > >>> belongs to the connector code, that code should also be moved to the > > >>> individual connector repo. If it's just a generic test, we could > consider > > >>> creating a generic test against released connector versions to > determine > > >>> compatibility. > > >>> > > >>> I'm curious about the opinions of others as well. > > >>> > > >>> Best regards, > > >>> > > >>> Martijn > > >>> > > >>> On Mon, Jul 3, 2023 at 3:37 PM Ran Tao <chucheng...@gmail.com> > wrote: > > >>> > > >>>> I have an issue here that needs to upgrade commons-collections[1] > (this > > >>> is > > >>>> an example), but PR ci fails because flink-python test cases depend > on > > >>>> flink-sql-connector-kafka, but kafka-sql-connector is a small jar, > does > > >>> not > > >>>> include this dependency, so flink ci cause exception[2]. Current my > > >>>> solution is [3]. But even if this PR is done, the upgrade of flink > still > > >>>> requires kafka-connector released. > > >>>> > > >>>> This issue leads to deeper problems. Although the connectors have > been > > >>>> externalized, many UTs of flink-python depend on these connectors, > and a > > >>>> basic agreement of externalized connectors is that other > dependencies > > >>>> cannot be introduced explicitly, which means the externalized > connectors > > >>>> use dependencies inherited from flink. In this way, when flink main > > >>>> upgrades some dependencies, it is easy to fail when executing > > >>> flink-python > > >>>> test cases,because flink no longer has this class, and the > connector does > > >>>> not contain it. It's circular problem. > > >>>> > > >>>> Unless, the connector self-consistently includes all dependencies, > which > > >>> is > > >>>> uncontrollable. > > >>>> (only a few connectors include all jars in shade phase) > > >>>> > > >>>> In short, the current flink-python module's dependencies on the > connector > > >>>> leads to an incomplete process of externalization and decoupling, > which > > >>>> will lead to circular dependencies when flink upgrade or change some > > >>>> dependencies. > > >>>> > > >>>> I don't know if I made it clear. I hope to get everyone's opinions > on > > >>> what > > >>>> better solutions we should adopt for similar problems in the future. > > >>>> > > >>>> [1] https://issues.apache.org/jira/browse/FLINK-30274 > > >>>> [2] > > >>>> > > >>>> > > >>> > https://user-images.githubusercontent.com/11287509/250120404-d12b60f4-7ff3-457e-a2c4-8cd415bb5ca2.png > > >>>> > > >>>> > > >>> > https://user-images.githubusercontent.com/11287509/250120522-6b096a4f-83f0-4287-b7ad-d46b9371de4c.png > > >>>> [3] https://github.com/apache/flink-connector-kafka/pull/38 > > >>>> > > >>>> Best Regards, > > >>>> Ran Tao > > >>>> https://github.com/chucheng92 > > >>>> > > >