Re: [DISCUSS] FLIP-XXX Improving PyFlink - The Zen of Flink

Zander Matheson Thu, 03 Jul 2025 11:03:10 -0700

Moving some discussion back from directly in the doc.
Dian Fu Commented:

> I guess we could group the working items into different FLIPs according to
> the categories, the efforts, etc:
> 1) API changes which make the API more pythonic, will be referenced as
> "FLIP-1"
> 2) Make the Python UDX more pythonic, will be referenced as "FLIP-2"
> 3) Improve the error message & stack trace, will be referenced as "FLIP-3"
> 4) Improve the Jupyter and data exploring support, will be referenced as
> "FLIP-4"
> 5) Pandas UDTF support, will be referenced as "FLIP-5"
> 6) Inference UDF, will be referenced as "FLIP-6"
> 7) Numpy types support, will be referenced as "FLIP-7"
> 8) Chunking/embedding/enrichment, will be referenced as "FLIP-8"



My comment in the doc:

Happy to go this route, it does feel like a lot of discussions, but that
> might be what we need :).



> Another alternative I thought of, since your earlier comment, was that we
> could move the items that are big, need more discussion or might be not
> directly part of the effort to make PyFlink more Pythonic to a "phase 2"
> and mark this as "phase 1".



> That would mean we only include the items that don't need further
> discussion or are smaller or are very core to the Zen of Flink effort (and
> we can discuss in this thread).



> If we were to go this route, things like what you have marked "FLIP-1" and
> "FLIP-2" (and maybe FLIP-3?) could stay in alongside issues and items that
> don't warrant further discussion.


That would mean we move FLIP-3 -> FLIP-8 things out and we can move forward
with discussion on this scope. I have highlighted those items in red  for
visibility for now.

The items we would discuss and align on for this FLIP in the option I
outlined would be the following:

Reference table columns as attributes

Allow pandas-like table.<my-col> reference in addition to col(“<my-col>”)
for all table API arguments

Kwargs aliasing

Allow polars-like table.agg(a_sum=<expr>) in addition to
table.select(<expr>.alias(“a_sum”) for providing named aliases via kwargs

Move getter methods to attributes where possible

Convert getter methods to properties where possible for more Pythonic
access.

Using Python types as well as or instead of DataTypes

Allow users to specify Python types in function signatures, which are
converted into Flink Types.

Move from Builder pattern to Python friendly patterns

Where possible move from the builder pattern that leaks from Java to
Python-native patterns like dataclasses, constructors, factory functions
and context/configuration patterns.

String methods

Add a string class to expressions with methods similar to Python and pandas.

Unraveling/Truncating Tracebacks

Capture and simplify JVM stack traces, showing only relevant information to
the Python user.

Best,

Zander


On Wed, Jul 2, 2025 at 11:55 PM Dian Fu <dian0511...@gmail.com> wrote:

> Hi Zander,
>
> Thanks for the reply. It's clear for me now.
>
> > Regarding follow on FLIPs, there are not currently any, but this seems
> > logical. We could make FLIPs for some of the items that should be
> discussed
> > separately and those that don't could be made into issues and tracked
> that
> > way? I am open to suggestions and this is my first FLIP so I don't know
> the
> > best path forward :).
>
> +1 to this approach.
>
> > I will add you to the document Dian, and could you
> > maybe mark some of the items that you think should have their own
> > discussions?
>
> Sure, great!
>
> Regards,
> Dian
>
> On Wed, Jul 2, 2025 at 5:42 AM Zander Matheson <a.w.mathe...@gmail.com>
> wrote:
> >
> > On the subject of Beam. I don't think this is a requirement for this
> > initiative, but ideally, long term, it would be nice to not have to rely
> on
> > Beam *where possible*, but I do understand the size of that effort is
> quite
> > large and we could move it out of this FLIP. The current work item was
> > really only around investigation of the alternatives, but we can push it
> > out.
> >
> > Regarding the Inference UDF. There are mechanisms, like caching and
> sharing
> > a model across processes, different compute types etc. that are unique to
> > model inference. This would help teams adopt Flink for more ML/AI use
> cases.
> >
> > On Numpy types, specifically it would be adding support for ndarray in a
> > similar way to how a pandas series is available in a vectorized UDF.
> >
> > Regarding follow on FLIPs, there are not currently any, but this seems
> > logical. We could make FLIPs for some of the items that should be
> discussed
> > separately and those that don't could be made into issues and tracked
> that
> > way? I am open to suggestions and this is my first FLIP so I don't know
> the
> > best path forward :). I will add you to the document Dian, and could you
> > maybe mark some of the items that you think should have their own
> > discussions?
> >
> > Best,
> >
> > Zander
> >
> > On Mon, Jun 30, 2025 at 8:22 PM Guowei Ma <guowei....@gmail.com> wrote:
> >
> > > Hi,  Zander
> > >
> > > Thank you for bringing this topic up.
> > >
> > > Could you give more input about the following improvement? Do you have
> some
> > > specific scenario that needs this improvement?
> > >
> > > Replace beam local execution
> > >
> > > Evaluate alternatives for the local execution engine.
> > >
> > > Best,
> > > Guowei
> > >
> > >
> > > On Sat, Jun 28, 2025 at 5:24 AM Alexander Matheson <
> a.w.mathe...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi devs,
> > > >
> > > > I would like to start a discussion about a new FLIP for a rather
> large
> > > > umbrella of work concerning PyFlink that Dian Fu, Xingbo Huang,
> myself
> > > and
> > > > others have been coordinating around.
> > > >
> > > > As PyFlink continues to grow in adoption (downloads are up 10x YoY on
> > > > PyPI!!!) it is overdue for additional investment to bring it inline
> with
> > > > the expectations of the Python community. Given the increase in AI
> > > > workloads shifting to real-time, these improvements will also help to
> > > > support the net-new Flink users coming from that space.
> > > >
> > > > The Project is called The Zen of Flink as an ode to the driving
> > > principles
> > > > of Python called the Zen of Python and is broadly about making Flink
> more
> > > > Pythonic. The work falls into six categories across API design,
> > > > documentation, debuggability, local development, integration with the
> > > > ecosystem and general usability. Not all of the work is concretely
> scoped
> > > > yet and is not planned to be as more improvements will arise as we
> work
> > > on
> > > > this effort.
> > > >
> > > > The details of the FLIP can be found in the google doc linked below.
> > > >
> > > >
> > > >
> > >
> https://docs.google.com/document/d/18_u1XA9C_zdY_fu1OtQDwYyIk_TwjUfzAUOzhGxbN6w/edit?usp=sharing
> > > >
> > > > Looking forward to the discussion.
> > > >
> > > > Best,
> > > >
> > > > Zander
> > > >
> > >
>

Re: [DISCUSS] FLIP-XXX Improving PyFlink - The Zen of Flink

Reply via email to