Re: [DISCUSS] FLIP-565: Improve ProcessTableFunctions for late data handling and state access

Gustavo de Morais Thu, 05 Mar 2026 01:03:24 -0800

Hi Timo,

Thank you for proposing these improvements. All address real pain points,
 so +1. It's especially good to see BROADCAST_SEMANTIC_TABLE. This unlocks
a set of use cases for use cases involving small lookup tables that can be
considerably optimized. I'm also +1 on supporting ORDER BY instead of an
additional argument trait.


Thanks for continuing to push PTFs forward - they are becoming really
powerful.

Kind regards,
Gustavo

On Wed, 4 Mar 2026 at 16:40, Ryan van Huuksloot via dev <
[email protected]> wrote:

> That makes sense to me. First make it work; then, make it easy.
>
> Otherwise the FLIP looks good to me. Some great improvements! Thanks for
> putting this together.
>
> Ryan van Huuksloot
> Staff Engineer, Infrastructure | Streaming Platform
> [image: Shopify]
> <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
>
>
> On Wed, Mar 4, 2026 at 9:22 AM Timo Walther <[email protected]> wrote:
>
> > Hi Ryan,
> >
> > thanks for the great feedback. I agree that some parts might still be
> > too complex, usability is definitely a continuous effort. For now, the
> > main goal of PTFs was to unblock people when something cannot be
> > expressed with SQL or would lead to very inefficient query plans. Also
> > they rather target a developer persona. Usually, a platform team that
> > develops PTFs for SQL personas. In the mid-term, I hope that AI will
> > implement most of the PTFs. So exposing engine primitives / building
> > blocks for AI is crucial.
> >
> > Maybe we can also offer a SimpleProcessFunction at some point, once we
> > know better why and how people use PTFs. Also having more built-in PTFs
> > that address the most frequent tasks can be very helpful.
> >
> > Please continue sharing your experiences: What are frequent tasks? What
> > do users want to achieve with PTFs?
> >
> > Cheers,
> > Timo
> >
> > On 03.03.26 21:09, Ryan van Huuksloot via dev wrote:
> > > Hi Timo,
> > >
> > > Thanks for the FLIP.
> > >
> > > Internally, we've started using PTFs and are still figuring out how to
> > best
> > > leverage them.
> > > The improvements you proposed in your FLIP are great.
> > > I wanted to mention the priority order for the 3 improvements you've
> > > recommended. I would prioritize them in the order you stated, based on
> > our
> > > usage. So far I haven't had any broadcast requests but I'm sure they're
> > > coming. The late arriving data will be very helpful.
> > >
> > > My primary concern with PTFs and large state is generally the
> complexity
> > of
> > > the state decisions. Most of our SQL developers won't understand when
> to
> > > use a "[Map][List][Value]View" with a PTF. Specifically this area in
> the
> > > documentation:
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/dev/table/functions/ptfs/#large-state
> > > You really need to understand Java concepts to grasp the intricacies of
> > > your decisions when choosing a state mechanism. I wonder if we can
> > simplify
> > > this decision for engineers who may not be Flink and Java experts. It
> may
> > > not be possible.
> > >
> > > Ryan van Huuksloot
> > > Staff Engineer, Infrastructure | Streaming Platform
> > > [image: Shopify]
> > > <
> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > >
> > >
> > >
> > > On Tue, Mar 3, 2026 at 3:47 AM Timo Walther <[email protected]>
> wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> Just bumping this thread again and happy to gather any feedback you
> > have.
> > >>
> > >> Thanks,
> > >> Timo
> > >>
> > >> On 16.02.26 09:35, Timo Walther wrote:
> > >>> Hi everyone,
> > >>>
> > >>> the ProcessTableFunction (PTF) feature has been well received by the
> > >>> Flink community and its adoption is increasing. Since FLIP-440 [1]
> > >>> introduced a lot of new API and new concepts, some design decisions
> > need
> > >>> smaller adjustments along late data handling and lazy state access.
> > >>>
> > >>> Also, talking to community members at Current and Flink Forward
> > >>> conferences has shown that broadcast state is crucial to bridge the
> gap
> > >>> to DataStream API applications for broadcast joining and rule-based
> > >> logic.
> > >>>
> > >>> I would like to propose FLIP-565: Improve ProcessTableFunctions for
> > late
> > >>> data handling and state access" [2].
> > >>>
> > >>> This FLIP proposes 3 important PTF improvements:
> > >>>
> > >>> 1) Don’t drop late data in ProcessFunction as data-loss is usually
> not
> > >>> intended; similar to DataStream API’s ProcessFunction
> > >>>
> > >>> 2) Introduce ValueView to enable a “supplier”-pattern for state
> access;
> > >>> similar to MapView and ListView
> > >>>
> > >>> 3) Introduce BROADCAST_SEMANTIC_TABLE as a new kind of argument to
> PTFs
> > >>>
> > >>> Regarding forward compatibility, all proposed items can be made
> > >>> available in batch mode eventually for a unified experience. From my
> > >>> point of view, these remaining adjustments should make PTF fully
> > >>> production ready, I don't expect any major additions in the mid-term.
> > >>>
> > >>> Looking forward to your feedback.
> > >>>
> > >>> Thanks,
> > >>> Timo
> > >>>
> > >>> [1] https://cwiki.apache.org/confluence/x/pQnPEQ
> > >>> [2] https://cwiki.apache.org/confluence/x/qIo8G
> > >>>
> > >>
> > >>
> > >
> >
> >
>

Re: [DISCUSS] FLIP-565: Improve ProcessTableFunctions for late data handling and state access

Reply via email to