Hi Till, 1. So far as I know, most of the users at Alibaba are using SQL. Some of users at Alibaba want integrated python libraries with Flink for streaming processing, and Jython is unusable.
2. Python UDFs for SQL: * declaring python UDF based on Alibaba's internal DDL syntax. * start a Python process in open() * communicate with JVM process via Socket. * Yes, it support python libraries, users can upload virutalenv/conda Python runtime 3. We've draft a design doc for Python API [DISCUSS] Flink Python API <https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web> Python UDF for SQL is not discussed in this documentation, we'll create a new proposal when the SQL DDL is ready. On Mon, Dec 10, 2018 at 9:52 PM Till Rohrmann <trohrm...@apache.org> wrote: > Hi Xianda, > > thanks for sharing this detailed feedback. Do I understand you correctly > that flink-python and flink-streaming-python are not usable for the use > cases at Alibaba atm? > > Could you share a bit more details about the Python UDFs for SQL? How do > you execute the Python code? Will it work with any Python library? If you > are about to publish the design document then feel free to refer me to this > document. > > Cheers, > Till > > On Mon, Dec 10, 2018 at 3:08 AM Xianda Ke <kexia...@gmail.com> wrote: > > > Xianda Ke <kexia...@gmail.com> > > 9:47 AM (11 minutes ago) > > to dev, user > > After communicating with some of the internal users at Alibaba, my > > impression is that: > > > > - Most of them need C extensions support, they want to integrated > their > > algorithms with stream processingļ¼but Jython is unacceptable for them. > > - For some users, who are only familiar with SQL/Python, developing > Java > > API application/UDF is too complex. Writing Python UDF and declaring > it > > in > > SQL is preferred. > > - Machine Learning users needs richer Python APIs, such as Table API > > Python support. > > > > > > From my point of view, currently Python support has a few caveats in > Flink. > > > > - For batch, there is only DataSet Python API. > > - For streaming, where Flink really shines, only Jython is supported, > > but Jython has lots of limitations. > > - For most of the big data users, SQL/Table API is more friendly, but > > Python users have no such APIs right now. > > - The interactive Python shell is very user-friendly. It can be used > to > > test interactively and is a simple way to learn the API. However, > there > > is > > no such interactive Python shell in Flink now. > > > > > > At Alibaba, Python UDF for SQL has been developed and has been delivered > to > > internal users. Currently, we start to develop the Python API, and we've > > drafted a design documentation and will publish it to the community soon > > for discussion. > > > > Regards, > > Xianda > > > > On Fri, Dec 7, 2018 at 11:30 PM Till Rohrmann <trohrm...@apache.org> > > wrote: > > > > > Dear Flink community, > > > > > > in order to better understand the needs of our users and to plan for > the > > > future, I wanted to reach out to you and ask how much you use Flink's > > > Python API, namely flink-python and flink-streaming-python. > > > > > > In order to gather feedback, I would like to ask all Python users to > > > respond to this thread and quickly outline how you use Python in > > > combination with Flink. Thanks a lot for your help! > > > > > > Cheers, > > > Till > > > > > > > > > -- > > Ke, Xianda > > > -- Ke, Xianda