Currently there is an ongoing survey about Python usage of Flink [1]. Some
discussion was also brought up there regarding non-jvm language support
strategy in general. To avoid polluting the survey thread, we are starting
this discussion thread and would like to move the discussions here.

In the interest of facilitating the discussion, we would like to first
share the following design doc which describes what we have done at Alibaba
about Python API for Flink. It could serve as a good reference to the
discussion.

 [DISCUSS] Flink Python API
<https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web>

As of now, we've implemented and delivered Python UDF for SQL for the
internal users at Alibaba.
We are starting to implement Python API.

To recap and continue the discussion from the survey thread, I agree with
@Stephan that we should figure out in which general direction Python
support should go. Stephan also list three options there:
* Option (1): Language portability via Apache Beam
* Option (2): Implement own Python API
* Option (3): Implement own portability layer

>From my perspective,
(1). Flink language APIs and Beam's languages support are not mutually
exclusive.
It is nice that Beam has Python/NodeJS/Go APIs, and support Flink as the
runner.
Flink's own Python(or NodeJS/Go) APIs will benefit Flink's ecosystem.

(2). Python API / portability layer
To support non-JVM languages in Flink,
 * at client side, Flink would provide language interfaces, which will
translate user's application to Flink StreamGraph.
* at server side, Flink would execute user's UDF code at runtime
The non-JVM languages communicate with JVM via RPC(or low-level socket,
embedded interpreter and so on). What the portability layer can do maybe is
abstracting the RPC layer. When the portability layer is ready, still there
are lots of stuff to do for a specified language. Say, Python, we may still
have to write the interface classes by hand for the users because generated
code without detailed documentation is unacceptable for users, or handle
the serialization issue of lambda/closure which is not a built-in feature
in Python.  Maybe, we can start with Python API, then extend to other
languages and abstract the logic in common as the portability layer.

---
References:
[1] [SURVEY] Usage of flink-python and flink-streaming-python

Regards,
Xianda

Reply via email to