Currently there is an ongoing survey about Python usage of Flink [1]. Some discussion was also brought up there regarding non-jvm language support strategy in general. To avoid polluting the survey thread, we are starting this discussion thread and would like to move the discussions here.
In the interest of facilitating the discussion, we would like to first share the following design doc which describes what we have done at Alibaba about Python API for Flink. It could serve as a good reference to the discussion. [DISCUSS] Flink Python API <https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web> As of now, we've implemented and delivered Python UDF for SQL for the internal users at Alibaba. We are starting to implement Python API. To recap and continue the discussion from the survey thread, I agree with @Stephan that we should figure out in which general direction Python support should go. Stephan also list three options there: * Option (1): Language portability via Apache Beam * Option (2): Implement own Python API * Option (3): Implement own portability layer >From my perspective, (1). Flink language APIs and Beam's languages support are not mutually exclusive. It is nice that Beam has Python/NodeJS/Go APIs, and support Flink as the runner. Flink's own Python(or NodeJS/Go) APIs will benefit Flink's ecosystem. (2). Python API / portability layer To support non-JVM languages in Flink, * at client side, Flink would provide language interfaces, which will translate user's application to Flink StreamGraph. * at server side, Flink would execute user's UDF code at runtime The non-JVM languages communicate with JVM via RPC(or low-level socket, embedded interpreter and so on). What the portability layer can do maybe is abstracting the RPC layer. When the portability layer is ready, still there are lots of stuff to do for a specified language. Say, Python, we may still have to write the interface classes by hand for the users because generated code without detailed documentation is unacceptable for users, or handle the serialization issue of lambda/closure which is not a built-in feature in Python. Maybe, we can start with Python API, then extend to other languages and abstract the logic in common as the portability layer. --- References: [1] [SURVEY] Usage of flink-python and flink-streaming-python Regards, Xianda