Did you take a look at Apache Beam? It already provides a comprehensive Python SDK and can be used with Flink: https://beam.apache.org/roadmap/portability/#python-on-flink
We are using it at Lyft for Python streaming pipelines. Thomas On Tue, Dec 11, 2018 at 5:54 AM Xianda Ke <kexia...@gmail.com> wrote: > Hi Till, > > 1. So far as I know, most of the users at Alibaba are using SQL. Some of > users at Alibaba want integrated python libraries with Flink for streaming > processing, and Jython is unusable. > > 2. Python UDFs for SQL: > * declaring python UDF based on Alibaba's internal DDL syntax. > * start a Python process in open() > * communicate with JVM process via Socket. > * Yes, it support python libraries, users can upload virutalenv/conda > Python runtime > > 3. We've draft a design doc for Python API > [DISCUSS] Flink Python API > < > https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web > > > > Python UDF for SQL is not discussed in this documentation, we'll create a > new proposal when the SQL DDL is ready. > > On Mon, Dec 10, 2018 at 9:52 PM Till Rohrmann <trohrm...@apache.org> > wrote: > > > Hi Xianda, > > > > thanks for sharing this detailed feedback. Do I understand you correctly > > that flink-python and flink-streaming-python are not usable for the use > > cases at Alibaba atm? > > > > Could you share a bit more details about the Python UDFs for SQL? How do > > you execute the Python code? Will it work with any Python library? If you > > are about to publish the design document then feel free to refer me to > this > > document. > > > > Cheers, > > Till > > > > On Mon, Dec 10, 2018 at 3:08 AM Xianda Ke <kexia...@gmail.com> wrote: > > > > > Xianda Ke <kexia...@gmail.com> > > > 9:47 AM (11 minutes ago) > > > to dev, user > > > After communicating with some of the internal users at Alibaba, my > > > impression is that: > > > > > > - Most of them need C extensions support, they want to integrated > > their > > > algorithms with stream processingļ¼but Jython is unacceptable for > them. > > > - For some users, who are only familiar with SQL/Python, developing > > Java > > > API application/UDF is too complex. Writing Python UDF and declaring > > it > > > in > > > SQL is preferred. > > > - Machine Learning users needs richer Python APIs, such as Table API > > > Python support. > > > > > > > > > From my point of view, currently Python support has a few caveats in > > Flink. > > > > > > - For batch, there is only DataSet Python API. > > > - For streaming, where Flink really shines, only Jython is > supported, > > > but Jython has lots of limitations. > > > - For most of the big data users, SQL/Table API is more friendly, > but > > > Python users have no such APIs right now. > > > - The interactive Python shell is very user-friendly. It can be used > > to > > > test interactively and is a simple way to learn the API. However, > > there > > > is > > > no such interactive Python shell in Flink now. > > > > > > > > > At Alibaba, Python UDF for SQL has been developed and has been > delivered > > to > > > internal users. Currently, we start to develop the Python API, and > we've > > > drafted a design documentation and will publish it to the community > soon > > > for discussion. > > > > > > Regards, > > > Xianda > > > > > > On Fri, Dec 7, 2018 at 11:30 PM Till Rohrmann <trohrm...@apache.org> > > > wrote: > > > > > > > Dear Flink community, > > > > > > > > in order to better understand the needs of our users and to plan for > > the > > > > future, I wanted to reach out to you and ask how much you use Flink's > > > > Python API, namely flink-python and flink-streaming-python. > > > > > > > > In order to gather feedback, I would like to ask all Python users to > > > > respond to this thread and quickly outline how you use Python in > > > > combination with Flink. Thanks a lot for your help! > > > > > > > > Cheers, > > > > Till > > > > > > > > > > > > > -- > > > Ke, Xianda > > > > > > > > -- > Ke, Xianda >