Did you take a look at Apache Beam? It already provides a comprehensive
Python SDK and can be used with Flink:
https://beam.apache.org/roadmap/portability/#python-on-flink

We are using it at Lyft for Python streaming pipelines.

Thomas

On Tue, Dec 11, 2018 at 5:54 AM Xianda Ke <kexia...@gmail.com> wrote:

> Hi Till,
>
> 1. So far as I know, most of the users at Alibaba are using SQL.  Some of
> users at Alibaba want integrated python libraries with Flink for streaming
> processing, and Jython is unusable.
>
> 2. Python UDFs for SQL:
> * declaring python UDF based on Alibaba's internal DDL syntax.
> * start a Python process in open()
> * communicate with JVM process via Socket.
> * Yes, it support python libraries, users can upload virutalenv/conda
> Python runtime
>
> 3. We've draft a design doc for Python API
>  [DISCUSS] Flink Python API
> <
> https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web
> >
>
> Python UDF for SQL is not discussed in this documentation, we'll create a
> new proposal when the SQL DDL is ready.
>
> On Mon, Dec 10, 2018 at 9:52 PM Till Rohrmann <trohrm...@apache.org>
> wrote:
>
> > Hi Xianda,
> >
> > thanks for sharing this detailed feedback. Do I understand you correctly
> > that flink-python and flink-streaming-python are not usable for the use
> > cases at Alibaba atm?
> >
> > Could you share a bit more details about the Python UDFs for SQL? How do
> > you execute the Python code? Will it work with any Python library? If you
> > are about to publish the design document then feel free to refer me to
> this
> > document.
> >
> > Cheers,
> > Till
> >
> > On Mon, Dec 10, 2018 at 3:08 AM Xianda Ke <kexia...@gmail.com> wrote:
> >
> > > Xianda Ke <kexia...@gmail.com>
> > > 9:47 AM (11 minutes ago)
> > > to dev, user
> > > After communicating with some of the internal users at Alibaba, my
> > > impression is that:
> > >
> > >    - Most of them need C extensions support, they want to integrated
> > their
> > >    algorithms with stream processing,but Jython is unacceptable for
> them.
> > >    - For some users, who are only familiar with SQL/Python, developing
> > Java
> > >    API application/UDF is too complex. Writing Python UDF and declaring
> > it
> > > in
> > >    SQL is preferred.
> > >    - Machine Learning users needs richer Python APIs, such as Table API
> > >    Python support.
> > >
> > >
> > > From my point of view, currently Python support has a few caveats in
> > Flink.
> > >
> > >    - For batch, there is only DataSet Python API.
> > >    - For streaming, where Flink really shines, only Jython is
> supported,
> > >    but Jython has lots of limitations.
> > >    - For most of the big data users, SQL/Table API is more friendly,
> but
> > >    Python users have no such APIs right now.
> > >    - The interactive Python shell is very user-friendly. It can be used
> > to
> > >    test interactively and is a simple way to learn the API. However,
> > there
> > > is
> > >    no such interactive Python shell in Flink now.
> > >
> > >
> > > At Alibaba, Python UDF for SQL has been developed and has been
> delivered
> > to
> > > internal users.  Currently, we start to develop the Python API, and
> we've
> > > drafted a design documentation and will publish it to the community
> soon
> > > for discussion.
> > >
> > > Regards,
> > > Xianda
> > >
> > > On Fri, Dec 7, 2018 at 11:30 PM Till Rohrmann <trohrm...@apache.org>
> > > wrote:
> > >
> > > > Dear Flink community,
> > > >
> > > > in order to better understand the needs of our users and to plan for
> > the
> > > > future, I wanted to reach out to you and ask how much you use Flink's
> > > > Python API, namely flink-python and flink-streaming-python.
> > > >
> > > > In order to gather feedback, I would like to ask all Python users to
> > > > respond to this thread and quickly outline how you use Python in
> > > > combination with Flink. Thanks a lot for your help!
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > >
> > >
> > > --
> > > Ke, Xianda
> > >
> >
>
>
> --
> Ke, Xianda
>

Reply via email to