Re: [DISCUSS] Python (and Non-JVM) Language Support in Flink

Shaoxuan Wang Thu, 13 Dec 2018 02:52:53 -0800

RE: Stephen's options (
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-python-and-flink-streaming-python-td25793.html
)
* Option (1): Language portability via Apache Beam
* Option (2): Implement own Python API
* Option (3): Implement own portability layer


Hi Stephen,
Eventually, I think we should support both option1 and option3. TMO, these
two options are orthogonal. I agree with you that we can leverage the
existing work and ecosystem in beam by supporting option1. But the problem
of beam is that it skips (to the best of my knowledge) the natural
table/SQL optimization framework provided by Flink. We should spend all the
needed efforts to support solution1 (as it is the better alternative of the
current Flink python API), but cannot solely bet on it. Option3 is the
ideal choice for Flink to support all Non-JVM languages which we should
better plan to achieve. We have done some preliminary prototypes for
option2/option3, and it seems not quite complex and difficult to accomplish.

Regards,
Shaoxuan


On Thu, Dec 13, 2018 at 4:58 PM Xianda Ke <kexia...@gmail.com> wrote:

> Currently there is an ongoing survey about Python usage of Flink [1]. Some
> discussion was also brought up there regarding non-jvm language support
> strategy in general. To avoid polluting the survey thread, we are starting
> this discussion thread and would like to move the discussions here.
>
> In the interest of facilitating the discussion, we would like to first
> share the following design doc which describes what we have done at Alibaba
> about Python API for Flink. It could serve as a good reference to the
> discussion.
>
>  [DISCUSS] Flink Python API
> <
> https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web
> >
>
> As of now, we've implemented and delivered Python UDF for SQL for the
> internal users at Alibaba.
> We are starting to implement Python API.
>
> To recap and continue the discussion from the survey thread, I agree with
> @Stephan that we should figure out in which general direction Python
> support should go. Stephan also list three options there:
> * Option (1): Language portability via Apache Beam
> * Option (2): Implement own Python API
> * Option (3): Implement own portability layer
>
> From my perspective,
> (1). Flink language APIs and Beam's languages support are not mutually
> exclusive.
> It is nice that Beam has Python/NodeJS/Go APIs, and support Flink as the
> runner.
> Flink's own Python(or NodeJS/Go) APIs will benefit Flink's ecosystem.
>
> (2). Python API / portability layer
> To support non-JVM languages in Flink,
>  * at client side, Flink would provide language interfaces, which will
> translate user's application to Flink StreamGraph.
> * at server side, Flink would execute user's UDF code at runtime
> The non-JVM languages communicate with JVM via RPC(or low-level socket,
> embedded interpreter and so on). What the portability layer can do maybe is
> abstracting the RPC layer. When the portability layer is ready, still there
> are lots of stuff to do for a specified language. Say, Python, we may still
> have to write the interface classes by hand for the users because generated
> code without detailed documentation is unacceptable for users, or handle
> the serialization issue of lambda/closure which is not a built-in feature
> in Python.  Maybe, we can start with Python API, then extend to other
> languages and abstract the logic in common as the portability layer.
>
> ---
> References:
> [1] [SURVEY] Usage of flink-python and flink-streaming-python
>
> Regards,
> Xianda
>

Re: [DISCUSS] Python (and Non-JVM) Language Support in Flink

Reply via email to