Re: [DISCUSS] FLIP-565: Improve ProcessTableFunctions for late data handling and state access

Timo Walther Wed, 04 Mar 2026 06:22:55 -0800

Hi Ryan,

thanks for the great feedback. I agree that some parts might still betoo complex, usability is definitely a continuous effort. For now, themain goal of PTFs was to unblock people when something cannot beexpressed with SQL or would lead to very inefficient query plans. Alsothey rather target a developer persona. Usually, a platform team thatdevelops PTFs for SQL personas. In the mid-term, I hope that AI willimplement most of the PTFs. So exposing engine primitives / buildingblocks for AI is crucial.

Maybe we can also offer a SimpleProcessFunction at some point, once weknow better why and how people use PTFs. Also having more built-in PTFsthat address the most frequent tasks can be very helpful.

Please continue sharing your experiences: What are frequent tasks? Whatdo users want to achieve with PTFs?


Cheers,
Timo

On 03.03.26 21:09, Ryan van Huuksloot via dev wrote:

Hi Timo,

Thanks for the FLIP.

Internally, we've started using PTFs and are still figuring out how to best
leverage them.
The improvements you proposed in your FLIP are great.
I wanted to mention the priority order for the 3 improvements you've
recommended. I would prioritize them in the order you stated, based on our
usage. So far I haven't had any broadcast requests but I'm sure they're
coming. The late arriving data will be very helpful.

My primary concern with PTFs and large state is generally the complexity of
the state decisions. Most of our SQL developers won't understand when to
use a "[Map][List][Value]View" with a PTF. Specifically this area in the
documentation:
https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/dev/table/functions/ptfs/#large-state
You really need to understand Java concepts to grasp the intricacies of
your decisions when choosing a state mechanism. I wonder if we can simplify
this decision for engineers who may not be Flink and Java experts. It may
not be possible.

Ryan van Huuksloot
Staff Engineer, Infrastructure | Streaming Platform
[image: Shopify]
<https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>


On Tue, Mar 3, 2026 at 3:47 AM Timo Walther <[email protected]> wrote:

Hi everyone,

Just bumping this thread again and happy to gather any feedback you have.

Thanks,
Timo

On 16.02.26 09:35, Timo Walther wrote:

Hi everyone,

the ProcessTableFunction (PTF) feature has been well received by the
Flink community and its adoption is increasing. Since FLIP-440 [1]
introduced a lot of new API and new concepts, some design decisions need
smaller adjustments along late data handling and lazy state access.

Also, talking to community members at Current and Flink Forward
conferences has shown that broadcast state is crucial to bridge the gap
to DataStream API applications for broadcast joining and rule-based

logic.


I would like to propose FLIP-565: Improve ProcessTableFunctions for late
data handling and state access" [2].

This FLIP proposes 3 important PTF improvements:

1) Don’t drop late data in ProcessFunction as data-loss is usually not
intended; similar to DataStream API’s ProcessFunction

2) Introduce ValueView to enable a “supplier”-pattern for state access;
similar to MapView and ListView

3) Introduce BROADCAST_SEMANTIC_TABLE as a new kind of argument to PTFs

Regarding forward compatibility, all proposed items can be made
available in batch mode eventually for a unified experience. From my
point of view, these remaining adjustments should make PTF fully
production ready, I don't expect any major additions in the mid-term.

Looking forward to your feedback.

Thanks,
Timo

[1] https://cwiki.apache.org/confluence/x/pQnPEQ
[2] https://cwiki.apache.org/confluence/x/qIo8G

Re: [DISCUSS] FLIP-565: Improve ProcessTableFunctions for late data handling and state access

Reply via email to