Re: [DISCUSS] Python (and Non-JVM) Language Support in Flink

Austin Bennett Tue, 18 Dec 2018 14:17:34 -0800

Hi Shaoxuan,

FWIW, Kenn Knowles (Beam PMC Chair) recently gave a talk at the Bay Area
Apache Beam Meetup[1] which included a bit on a vision for how Beam could
better leverage runner specific optimizations -- as an example/extension,
Beam SQL leveraging Flink specific SQL optimizations (to address your
point).  So, that is part of the eventual roadmap for Beam, and illustrates
how concrete efforts towards optimizations in Runner/SDK-Harness would
likely yield the desired result of cross-language support (which could be
done by leveraging Beam, and devote focus to optimizing that processing on
Flink).


Cheers,
Austin


[1] https://www.meetup.com/San-Francisco-Apache-Beam/events/256348972/ -- I
can post/share videos once available should someone desire.

On Fri, Dec 14, 2018 at 6:03 AM Maximilian Michels <m...@apache.org> wrote:

> Hi Xianda, hi Shaoxuan,
>
> I'd be in favor of option (1). There is great potential in Beam and Flink
> joining forces on this one. Here's why:
>
> The Beam project spent at least a year developing a portability layer with
> a
> reasonable amount of people working on it. Developing a new portability
> layer
> from scratch will probably take about the same amount of time and
> resources.
>
> Concerning option (2): There is already a Python API for Flink but an API
> is
> only one part of the portability story. In Beam the portability is
> structured
> into three components:
>
> - SDK (API, its Protobuf serialization, and interaction with the SDK
> Harness)
> - Runner (Translation from Protobuf pipeline to Flink job)
> - SDK Harness (UDF execution, Interaction with the SDK and the execution
> engine)
>
> I could imagine the Flink Python API would be another SDK which could have
> its
> own API but would reuse code for the interaction with the SDK Harness.
>
> We would be able to focus on the optimizations instead of rebuilding a
> portability layer from scratch.
>
> Thanks,
> Max
>
> On 13.12.18 11:52, Shaoxuan Wang wrote:
> > RE: Stephen's options (
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-python-and-flink-streaming-python-td25793.html
> > )
> > * Option (1): Language portability via Apache Beam
> > * Option (2): Implement own Python API
> > * Option (3): Implement own portability layer
> >
> > Hi Stephen,
> > Eventually, I think we should support both option1 and option3. TMO,
> these
> > two options are orthogonal. I agree with you that we can leverage the
> > existing work and ecosystem in beam by supporting option1. But the
> problem
> > of beam is that it skips (to the best of my knowledge) the natural
> > table/SQL optimization framework provided by Flink. We should spend all
> the
> > needed efforts to support solution1 (as it is the better alternative of
> the
> > current Flink python API), but cannot solely bet on it. Option3 is the
> > ideal choice for Flink to support all Non-JVM languages which we should
> > better plan to achieve. We have done some preliminary prototypes for
> > option2/option3, and it seems not quite complex and difficult to
> accomplish.
> >
> > Regards,
> > Shaoxuan
> >
> >
> > On Thu, Dec 13, 2018 at 4:58 PM Xianda Ke <kexia...@gmail.com> wrote:
> >
> >> Currently there is an ongoing survey about Python usage of Flink [1].
> Some
> >> discussion was also brought up there regarding non-jvm language support
> >> strategy in general. To avoid polluting the survey thread, we are
> starting
> >> this discussion thread and would like to move the discussions here.
> >>
> >> In the interest of facilitating the discussion, we would like to first
> >> share the following design doc which describes what we have done at
> Alibaba
> >> about Python API for Flink. It could serve as a good reference to the
> >> discussion.
> >>
> >>   [DISCUSS] Flink Python API
> >> <
> >>
> https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web
> >>>
> >>
> >> As of now, we've implemented and delivered Python UDF for SQL for the
> >> internal users at Alibaba.
> >> We are starting to implement Python API.
> >>
> >> To recap and continue the discussion from the survey thread, I agree
> with
> >> @Stephan that we should figure out in which general direction Python
> >> support should go. Stephan also list three options there:
> >> * Option (1): Language portability via Apache Beam
> >> * Option (2): Implement own Python API
> >> * Option (3): Implement own portability layer
> >>
> >>  From my perspective,
> >> (1). Flink language APIs and Beam's languages support are not mutually
> >> exclusive.
> >> It is nice that Beam has Python/NodeJS/Go APIs, and support Flink as the
> >> runner.
> >> Flink's own Python(or NodeJS/Go) APIs will benefit Flink's ecosystem.
> >>
> >> (2). Python API / portability layer
> >> To support non-JVM languages in Flink,
> >>   * at client side, Flink would provide language interfaces, which will
> >> translate user's application to Flink StreamGraph.
> >> * at server side, Flink would execute user's UDF code at runtime
> >> The non-JVM languages communicate with JVM via RPC(or low-level socket,
> >> embedded interpreter and so on). What the portability layer can do
> maybe is
> >> abstracting the RPC layer. When the portability layer is ready, still
> there
> >> are lots of stuff to do for a specified language. Say, Python, we may
> still
> >> have to write the interface classes by hand for the users because
> generated
> >> code without detailed documentation is unacceptable for users, or handle
> >> the serialization issue of lambda/closure which is not a built-in
> feature
> >> in Python.  Maybe, we can start with Python API, then extend to other
> >> languages and abstract the logic in common as the portability layer.
> >>
> >> ---
> >> References:
> >> [1] [SURVEY] Usage of flink-python and flink-streaming-python
> >>
> >> Regards,
> >> Xianda
> >>
> >
>

Re: [DISCUSS] Python (and Non-JVM) Language Support in Flink

Reply via email to