Re: [DISCUSS] FLIP-XXX Improving PyFlink - The Zen of Flink

Dian Fu Mon, 30 Jun 2025 19:55:47 -0700

One more question, what's the motivation and what do you want to do in
the part `replace beam local execution`? Not sure if you want to
improve the debugging experience. It supports loopback mode [1] in
PyFlink which allows debugging the Python UDF in the IDE without any
setup (just setting breakpoint and running the job).


Regards,
Dian

[1] https://issues.apache.org/jira/browse/FLINK-21222

On Tue, Jul 1, 2025 at 10:25 AM Dian Fu <dian0511...@gmail.com> wrote:
>
> Hi Zander,
>
> Thanks for the reply. Makes sense to me!
>
> Some follow-up questions:
> 1) Are there follow-up sub-FLIP discussions? I'm asking this because
> this doc seems more like an umbrella which shapes the whole picture on
> what we want to do. For example, it includes things we could just do
> without voting, e.g.  async scalar function and table function
> support, window TVF support, etc. It also contains things which seem
> like a big story and deserve a whole design doc, e.g. Data Exploration
> and EDA support, the inference UDF, the Numpy Types, etc.
>
> 2) What do you mean on the parts of inference UDF and Numpy types?
> Could you explain a bit more on these parts.
>
> Regards,
> Dian
>
>
> On Tue, Jul 1, 2025 at 6:37 AM Zander Matheson <a.w.mathe...@gmail.com> wrote:
> >
> > Thanks Dian Fu,
> >
> > On 1) This is more in reference to how if we modify things like the builder
> > pattern, we could end up changing certain configuration patterns. I will
> > reframe this to say - Currently there are no expected interfaces that will
> > be removed, but as the work evolves there may be some required changes to
> > determined non-pythonic areas, but the best effort will be made to mirror
> > or maintain an escape hatch.
> >
> > 2) I understand the desire to limit the scope here because we don't want to
> > go down the Pandas parity rabbit hole. Would it suffice to limit the scope
> > to foundational operations (Creation, Inspection and I/O) and Core
> > manipulation (Selection, Indexing and Filtering). These could be further
> > outlined in issues. Maybe the following would suffice for this FLIP:
> >
> > Dataframe Methods
> >
> > Add friendly dataframe methods for creation, inspection and I/O that exist
> > in other data libraries like .read_json(), read_csv(), .head(), .show() and
> > .display().
> >
> > Reference table columns as attributes
> >
> > Allow pandas-like table.<my-col> reference in addition to col(“<my-col>”)
> > for all table API arguments
> >
> > Kwargs aliasing
> >
> > Allow polars-like table.agg(a_sum=<expr>) in addition to
> > table.select(<expr>.alias(“a_sum”) for providing named aliases via kwargs
> >
> >
> > 3) I am ok with removing this for now although I do wish there was an
> > easier way to include some of the most common connector interfaces.
> >
> > - Zander
> >
> > On Sun, Jun 29, 2025 at 11:13 PM Dian Fu <dian0511...@gmail.com> wrote:
> >
> > > Hi Zander,
> > >
> > > Thanks for driving this effort! Big +1 overally. This will be a good
> > > improvement for Python users.
> > >
> > > Some quick questions about this FLIP:
> > >
> > > 1) Low-Level Knobs: Certain low-level, non-Pythonic configuration
> > > options may be deprecated or hidden to simplify the API surface.
> > >
> > > Could you give some examples on which configuration options do you mean?
> > >
> > > 2) Introduction of user-friendly methods on the Table object for data
> > > preview, such as .show() and .display(), similar to those in other
> > > data-frame libraries.
> > >
> > > Dataframe style APIs are widely adopted in the Python world. However,
> > > there are many convenient APIs in the DataFrame, I guess it deserves a
> > > separate FLIP to discuss which kinds of API we want to borrow from it.
> > >
> > > 3) Package top connectors (Kafka, Parquet, S3) to reduce friction from
> > > manual JAR downloads.
> > >
> > > I'm not sure if this is feasible since the connector implementations
> > > have been moved to separate repos. However, I agree that the
> > > experience should be improved. Maybe we could provide some guides to
> > > improve the experience.
> > >
> > > Regards,
> > > Dian
> > >
> > >
> > >
> > >
> > > On Sat, Jun 28, 2025 at 5:24 AM Alexander Matheson
> > > <a.w.mathe...@gmail.com> wrote:
> > > >
> > > > Hi devs,
> > > >
> > > > I would like to start a discussion about a new FLIP for a rather large
> > > > umbrella of work concerning PyFlink that Dian Fu, Xingbo Huang, myself
> > > and
> > > > others have been coordinating around.
> > > >
> > > > As PyFlink continues to grow in adoption (downloads are up 10x YoY on
> > > > PyPI!!!) it is overdue for additional investment to bring it inline with
> > > > the expectations of the Python community. Given the increase in AI
> > > > workloads shifting to real-time, these improvements will also help to
> > > > support the net-new Flink users coming from that space.
> > > >
> > > > The Project is called The Zen of Flink as an ode to the driving
> > > principles
> > > > of Python called the Zen of Python and is broadly about making Flink 
> > > > more
> > > > Pythonic. The work falls into six categories across API design,
> > > > documentation, debuggability, local development, integration with the
> > > > ecosystem and general usability. Not all of the work is concretely 
> > > > scoped
> > > > yet and is not planned to be as more improvements will arise as we work
> > > on
> > > > this effort.
> > > >
> > > > The details of the FLIP can be found in the google doc linked below.
> > > >
> > > >
> > > https://docs.google.com/document/d/18_u1XA9C_zdY_fu1OtQDwYyIk_TwjUfzAUOzhGxbN6w/edit?usp=sharing
> > > >
> > > > Looking forward to the discussion.
> > > >
> > > > Best,
> > > >
> > > > Zander
> > >

Re: [DISCUSS] FLIP-XXX Improving PyFlink - The Zen of Flink

Reply via email to