Re: [DISCUSS] FLIP-565: Improve ProcessTableFunctions for late data handling and state access

Timo Walther Fri, 06 Mar 2026 01:12:56 -0800

Hi everyone,

if there are not objections, I would start a VOTE on Monday.


Thanks,
Timo

On 05.03.26 10:02, Gustavo de Morais wrote:

Hi Timo,

Thank you for proposing these improvements. All address real pain points,
  so +1. It's especially good to see BROADCAST_SEMANTIC_TABLE. This unlocks
a set of use cases for use cases involving small lookup tables that can be
considerably optimized. I'm also +1 on supporting ORDER BY instead of an
additional argument trait.

Thanks for continuing to push PTFs forward - they are becoming really
powerful.

Kind regards,
Gustavo

On Wed, 4 Mar 2026 at 16:40, Ryan van Huuksloot via dev <
[email protected]> wrote:

That makes sense to me. First make it work; then, make it easy.

Otherwise the FLIP looks good to me. Some great improvements! Thanks for
putting this together.

Ryan van Huuksloot
Staff Engineer, Infrastructure | Streaming Platform
[image: Shopify]
<https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>


On Wed, Mar 4, 2026 at 9:22 AM Timo Walther <[email protected]> wrote:

Hi Ryan,

thanks for the great feedback. I agree that some parts might still be
too complex, usability is definitely a continuous effort. For now, the
main goal of PTFs was to unblock people when something cannot be
expressed with SQL or would lead to very inefficient query plans. Also
they rather target a developer persona. Usually, a platform team that
develops PTFs for SQL personas. In the mid-term, I hope that AI will
implement most of the PTFs. So exposing engine primitives / building
blocks for AI is crucial.

Maybe we can also offer a SimpleProcessFunction at some point, once we
know better why and how people use PTFs. Also having more built-in PTFs
that address the most frequent tasks can be very helpful.

Please continue sharing your experiences: What are frequent tasks? What
do users want to achieve with PTFs?

Cheers,
Timo

On 03.03.26 21:09, Ryan van Huuksloot via dev wrote:

Hi Timo,

Thanks for the FLIP.

Internally, we've started using PTFs and are still figuring out how to

best

leverage them.
The improvements you proposed in your FLIP are great.
I wanted to mention the priority order for the 3 improvements you've
recommended. I would prioritize them in the order you stated, based on

our

usage. So far I haven't had any broadcast requests but I'm sure they're
coming. The late arriving data will be very helpful.

My primary concern with PTFs and large state is generally the

complexity

of

the state decisions. Most of our SQL developers won't understand when

to

use a "[Map][List][Value]View" with a PTF. Specifically this area in

the

documentation:

https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/dev/table/functions/ptfs/#large-state

You really need to understand Java concepts to grasp the intricacies of
your decisions when choosing a state mechanism. I wonder if we can

simplify

this decision for engineers who may not be Flink and Java experts. It

may

not be possible.

Ryan van Huuksloot
Staff Engineer, Infrastructure | Streaming Platform
[image: Shopify]
<

https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email




On Tue, Mar 3, 2026 at 3:47 AM Timo Walther <[email protected]>

wrote:

Hi everyone,

Just bumping this thread again and happy to gather any feedback you

have.


Thanks,
Timo

On 16.02.26 09:35, Timo Walther wrote:

Hi everyone,

the ProcessTableFunction (PTF) feature has been well received by the
Flink community and its adoption is increasing. Since FLIP-440 [1]
introduced a lot of new API and new concepts, some design decisions

need

smaller adjustments along late data handling and lazy state access.

Also, talking to community members at Current and Flink Forward
conferences has shown that broadcast state is crucial to bridge the

gap

to DataStream API applications for broadcast joining and rule-based

logic.


I would like to propose FLIP-565: Improve ProcessTableFunctions for

late

data handling and state access" [2].

This FLIP proposes 3 important PTF improvements:

1) Don’t drop late data in ProcessFunction as data-loss is usually

not

intended; similar to DataStream API’s ProcessFunction

2) Introduce ValueView to enable a “supplier”-pattern for state

access;

similar to MapView and ListView

3) Introduce BROADCAST_SEMANTIC_TABLE as a new kind of argument to

PTFs


Regarding forward compatibility, all proposed items can be made
available in batch mode eventually for a unified experience. From my
point of view, these remaining adjustments should make PTF fully
production ready, I don't expect any major additions in the mid-term.

Looking forward to your feedback.

Thanks,
Timo

[1] https://cwiki.apache.org/confluence/x/pQnPEQ
[2] https://cwiki.apache.org/confluence/x/qIo8G

Re: [DISCUSS] FLIP-565: Improve ProcessTableFunctions for late data handling and state access

Reply via email to