Hi Shaoxuan, FWIW, Kenn Knowles (Beam PMC Chair) recently gave a talk at the Bay Area Apache Beam Meetup[1] which included a bit on a vision for how Beam could better leverage runner specific optimizations -- as an example/extension, Beam SQL leveraging Flink specific SQL optimizations (to address your point). So, that is part of the eventual roadmap for Beam, and illustrates how concrete efforts towards optimizations in Runner/SDK-Harness would likely yield the desired result of cross-language support (which could be done by leveraging Beam, and devote focus to optimizing that processing on Flink).
Cheers, Austin [1] https://www.meetup.com/San-Francisco-Apache-Beam/events/256348972/ -- I can post/share videos once available should someone desire. On Fri, Dec 14, 2018 at 6:03 AM Maximilian Michels <m...@apache.org> wrote: > Hi Xianda, hi Shaoxuan, > > I'd be in favor of option (1). There is great potential in Beam and Flink > joining forces on this one. Here's why: > > The Beam project spent at least a year developing a portability layer with > a > reasonable amount of people working on it. Developing a new portability > layer > from scratch will probably take about the same amount of time and > resources. > > Concerning option (2): There is already a Python API for Flink but an API > is > only one part of the portability story. In Beam the portability is > structured > into three components: > > - SDK (API, its Protobuf serialization, and interaction with the SDK > Harness) > - Runner (Translation from Protobuf pipeline to Flink job) > - SDK Harness (UDF execution, Interaction with the SDK and the execution > engine) > > I could imagine the Flink Python API would be another SDK which could have > its > own API but would reuse code for the interaction with the SDK Harness. > > We would be able to focus on the optimizations instead of rebuilding a > portability layer from scratch. > > Thanks, > Max > > On 13.12.18 11:52, Shaoxuan Wang wrote: > > RE: Stephen's options ( > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-python-and-flink-streaming-python-td25793.html > > ) > > * Option (1): Language portability via Apache Beam > > * Option (2): Implement own Python API > > * Option (3): Implement own portability layer > > > > Hi Stephen, > > Eventually, I think we should support both option1 and option3. TMO, > these > > two options are orthogonal. I agree with you that we can leverage the > > existing work and ecosystem in beam by supporting option1. But the > problem > > of beam is that it skips (to the best of my knowledge) the natural > > table/SQL optimization framework provided by Flink. We should spend all > the > > needed efforts to support solution1 (as it is the better alternative of > the > > current Flink python API), but cannot solely bet on it. Option3 is the > > ideal choice for Flink to support all Non-JVM languages which we should > > better plan to achieve. We have done some preliminary prototypes for > > option2/option3, and it seems not quite complex and difficult to > accomplish. > > > > Regards, > > Shaoxuan > > > > > > On Thu, Dec 13, 2018 at 4:58 PM Xianda Ke <kexia...@gmail.com> wrote: > > > >> Currently there is an ongoing survey about Python usage of Flink [1]. > Some > >> discussion was also brought up there regarding non-jvm language support > >> strategy in general. To avoid polluting the survey thread, we are > starting > >> this discussion thread and would like to move the discussions here. > >> > >> In the interest of facilitating the discussion, we would like to first > >> share the following design doc which describes what we have done at > Alibaba > >> about Python API for Flink. It could serve as a good reference to the > >> discussion. > >> > >> [DISCUSS] Flink Python API > >> < > >> > https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web > >>> > >> > >> As of now, we've implemented and delivered Python UDF for SQL for the > >> internal users at Alibaba. > >> We are starting to implement Python API. > >> > >> To recap and continue the discussion from the survey thread, I agree > with > >> @Stephan that we should figure out in which general direction Python > >> support should go. Stephan also list three options there: > >> * Option (1): Language portability via Apache Beam > >> * Option (2): Implement own Python API > >> * Option (3): Implement own portability layer > >> > >> From my perspective, > >> (1). Flink language APIs and Beam's languages support are not mutually > >> exclusive. > >> It is nice that Beam has Python/NodeJS/Go APIs, and support Flink as the > >> runner. > >> Flink's own Python(or NodeJS/Go) APIs will benefit Flink's ecosystem. > >> > >> (2). Python API / portability layer > >> To support non-JVM languages in Flink, > >> * at client side, Flink would provide language interfaces, which will > >> translate user's application to Flink StreamGraph. > >> * at server side, Flink would execute user's UDF code at runtime > >> The non-JVM languages communicate with JVM via RPC(or low-level socket, > >> embedded interpreter and so on). What the portability layer can do > maybe is > >> abstracting the RPC layer. When the portability layer is ready, still > there > >> are lots of stuff to do for a specified language. Say, Python, we may > still > >> have to write the interface classes by hand for the users because > generated > >> code without detailed documentation is unacceptable for users, or handle > >> the serialization issue of lambda/closure which is not a built-in > feature > >> in Python. Maybe, we can start with Python API, then extend to other > >> languages and abstract the logic in common as the portability layer. > >> > >> --- > >> References: > >> [1] [SURVEY] Usage of flink-python and flink-streaming-python > >> > >> Regards, > >> Xianda > >> > > >