Thanks a lot for the feedback for this survey. I will close it now since 6 days have passed without new activity.
To me it seems that we currently don't have many users who use flink-python or flink-streaming-python because of their limitations (mentioned in the survey by Xianda). This information might be useful when discussing Flink's future Python strategy and whether to continue supporting flink-python and flink-streaming-python in the future. Cheers, Till On Thu, Dec 13, 2018 at 10:50 AM Stephan Ewen <se...@apache.org> wrote: > You are right. Let's refocus this on the python user survey and spin out > another thread. > > On Thu, Dec 13, 2018 at 9:56 AM Xianda Ke <kexia...@gmail.com> wrote: > > > Hi Folks, > > To avoid polluting the survey thread with discussions, we started > separate > > thread and maybe we can continue the discussion over there. > > > > Regards, > > Xianda > > > > On Wed, Dec 12, 2018 at 3:34 AM Stephan Ewen <se...@apache.org> wrote: > > > > > I like that we are having a general discussion about how to use Python > > and > > > Flink together in the future. > > > The current python support has some shortcomings that were mentioned > > > before, so we clearly need something better. > > > > > > Parts of the community have worked together with the Apache Beam > project, > > > which is pretty far in adding a portability layer to support Python. > > > Before we dive deep into a design proposal for a new Python API in > > Flink, I > > > think we should figure out in which general direction Python support > > should > > > go. > > > > > > *Option (1): Language portability via Apache Beam* > > > > > > Pro: > > > - already exists to a large extend and already has users > > > - portability layer offers other languages in addition to python. Go > is > > > in the making, NodeJS has been speculated, etc. > > > - collaboration with another project / community which means more > > > manpower and exposure. Beam currently has a strong focus on Flink as a > > > runner for Python. > > > - Python API is used for existing ML libraries from the TensorFlow > > > ecosystem > > > > > > Con: > > > - Not Flink's API. Python users need to learn the syntax of another > API > > > (Python API is inherently different, but even more different here). > > > > > > *Option (2): Implement own Python API* > > > > > > Pro: > > > - Python API will be closer to Flink Java / Scala APIs > > > > > > Con: > > > - We will only have Python. > > > - Need to to rebuild the Python language bridge (significant work to > > get > > > stable) > > > - might lose tight collaboration with Beam and the other parties in > > Beam > > > - not benefiting from Beam's ecosystem > > > > > > *Option (3): **Implement own portability layer* > > > > > > Pro > > > - Flexibility to align APIs across languages within Flink ecosystem > > > > > > Con > > > - A lot of work (for context, to get this feature complete, Beam has > > > worked on that for a year now) > > > - Replicating work that already exists > > > - good chance to lose tight collaboration with Beam and parties in > that > > > project > > > - not benefiting from Beam's ecosystem > > > > > > Best, > > > Stephan > > > > > > > > > On Tue, Dec 11, 2018 at 3:38 PM Thomas Weise <t...@apache.org> wrote: > > > > > > > Did you take a look at Apache Beam? It already provides a > comprehensive > > > > Python SDK and can be used with Flink: > > > > https://beam.apache.org/roadmap/portability/#python-on-flink > > > > > > > > We are using it at Lyft for Python streaming pipelines. > > > > > > > > Thomas > > > > > > > > On Tue, Dec 11, 2018 at 5:54 AM Xianda Ke <kexia...@gmail.com> > wrote: > > > > > > > > > Hi Till, > > > > > > > > > > 1. So far as I know, most of the users at Alibaba are using SQL. > > Some > > > of > > > > > users at Alibaba want integrated python libraries with Flink for > > > > streaming > > > > > processing, and Jython is unusable. > > > > > > > > > > 2. Python UDFs for SQL: > > > > > * declaring python UDF based on Alibaba's internal DDL syntax. > > > > > * start a Python process in open() > > > > > * communicate with JVM process via Socket. > > > > > * Yes, it support python libraries, users can upload > virutalenv/conda > > > > > Python runtime > > > > > > > > > > 3. We've draft a design doc for Python API > > > > > [DISCUSS] Flink Python API > > > > > < > > > > > > > > > > > > > > > https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web > > > > > > > > > > > > > > > > Python UDF for SQL is not discussed in this documentation, we'll > > > create a > > > > > new proposal when the SQL DDL is ready. > > > > > > > > > > On Mon, Dec 10, 2018 at 9:52 PM Till Rohrmann < > trohrm...@apache.org> > > > > > wrote: > > > > > > > > > > > Hi Xianda, > > > > > > > > > > > > thanks for sharing this detailed feedback. Do I understand you > > > > correctly > > > > > > that flink-python and flink-streaming-python are not usable for > the > > > use > > > > > > cases at Alibaba atm? > > > > > > > > > > > > Could you share a bit more details about the Python UDFs for SQL? > > How > > > > do > > > > > > you execute the Python code? Will it work with any Python > library? > > If > > > > you > > > > > > are about to publish the design document then feel free to refer > me > > > to > > > > > this > > > > > > document. > > > > > > > > > > > > Cheers, > > > > > > Till > > > > > > > > > > > > On Mon, Dec 10, 2018 at 3:08 AM Xianda Ke <kexia...@gmail.com> > > > wrote: > > > > > > > > > > > > > Xianda Ke <kexia...@gmail.com> > > > > > > > 9:47 AM (11 minutes ago) > > > > > > > to dev, user > > > > > > > After communicating with some of the internal users at Alibaba, > > my > > > > > > > impression is that: > > > > > > > > > > > > > > - Most of them need C extensions support, they want to > > > integrated > > > > > > their > > > > > > > algorithms with stream processingļ¼but Jython is unacceptable > > for > > > > > them. > > > > > > > - For some users, who are only familiar with SQL/Python, > > > > developing > > > > > > Java > > > > > > > API application/UDF is too complex. Writing Python UDF and > > > > declaring > > > > > > it > > > > > > > in > > > > > > > SQL is preferred. > > > > > > > - Machine Learning users needs richer Python APIs, such as > > Table > > > > API > > > > > > > Python support. > > > > > > > > > > > > > > > > > > > > > From my point of view, currently Python support has a few > caveats > > > in > > > > > > Flink. > > > > > > > > > > > > > > - For batch, there is only DataSet Python API. > > > > > > > - For streaming, where Flink really shines, only Jython is > > > > > supported, > > > > > > > but Jython has lots of limitations. > > > > > > > - For most of the big data users, SQL/Table API is more > > > friendly, > > > > > but > > > > > > > Python users have no such APIs right now. > > > > > > > - The interactive Python shell is very user-friendly. It can > > be > > > > used > > > > > > to > > > > > > > test interactively and is a simple way to learn the API. > > > However, > > > > > > there > > > > > > > is > > > > > > > no such interactive Python shell in Flink now. > > > > > > > > > > > > > > > > > > > > > At Alibaba, Python UDF for SQL has been developed and has been > > > > > delivered > > > > > > to > > > > > > > internal users. Currently, we start to develop the Python API, > > and > > > > > we've > > > > > > > drafted a design documentation and will publish it to the > > community > > > > > soon > > > > > > > for discussion. > > > > > > > > > > > > > > Regards, > > > > > > > Xianda > > > > > > > > > > > > > > On Fri, Dec 7, 2018 at 11:30 PM Till Rohrmann < > > > trohrm...@apache.org> > > > > > > > wrote: > > > > > > > > > > > > > > > Dear Flink community, > > > > > > > > > > > > > > > > in order to better understand the needs of our users and to > > plan > > > > for > > > > > > the > > > > > > > > future, I wanted to reach out to you and ask how much you use > > > > Flink's > > > > > > > > Python API, namely flink-python and flink-streaming-python. > > > > > > > > > > > > > > > > In order to gather feedback, I would like to ask all Python > > users > > > > to > > > > > > > > respond to this thread and quickly outline how you use Python > > in > > > > > > > > combination with Flink. Thanks a lot for your help! > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Ke, Xianda > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Ke, Xianda > > > > > > > > > > > > > > > > > > -- > > Ke, Xianda > > >