Hi Ryan,
thanks for the great feedback. I agree that some parts might still be
too complex, usability is definitely a continuous effort. For now, the
main goal of PTFs was to unblock people when something cannot be
expressed with SQL or would lead to very inefficient query plans. Also
they rather target a developer persona. Usually, a platform team that
develops PTFs for SQL personas. In the mid-term, I hope that AI will
implement most of the PTFs. So exposing engine primitives / building
blocks for AI is crucial.
Maybe we can also offer a SimpleProcessFunction at some point, once we
know better why and how people use PTFs. Also having more built-in PTFs
that address the most frequent tasks can be very helpful.
Please continue sharing your experiences: What are frequent tasks? What
do users want to achieve with PTFs?
Cheers,
Timo
On 03.03.26 21:09, Ryan van Huuksloot via dev wrote:
Hi Timo,
Thanks for the FLIP.
Internally, we've started using PTFs and are still figuring out how to best
leverage them.
The improvements you proposed in your FLIP are great.
I wanted to mention the priority order for the 3 improvements you've
recommended. I would prioritize them in the order you stated, based on our
usage. So far I haven't had any broadcast requests but I'm sure they're
coming. The late arriving data will be very helpful.
My primary concern with PTFs and large state is generally the complexity of
the state decisions. Most of our SQL developers won't understand when to
use a "[Map][List][Value]View" with a PTF. Specifically this area in the
documentation:
https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/dev/table/functions/ptfs/#large-state
You really need to understand Java concepts to grasp the intricacies of
your decisions when choosing a state mechanism. I wonder if we can simplify
this decision for engineers who may not be Flink and Java experts. It may
not be possible.
Ryan van Huuksloot
Staff Engineer, Infrastructure | Streaming Platform
[image: Shopify]
<https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
On Tue, Mar 3, 2026 at 3:47 AM Timo Walther <[email protected]> wrote:
Hi everyone,
Just bumping this thread again and happy to gather any feedback you have.
Thanks,
Timo
On 16.02.26 09:35, Timo Walther wrote:
Hi everyone,
the ProcessTableFunction (PTF) feature has been well received by the
Flink community and its adoption is increasing. Since FLIP-440 [1]
introduced a lot of new API and new concepts, some design decisions need
smaller adjustments along late data handling and lazy state access.
Also, talking to community members at Current and Flink Forward
conferences has shown that broadcast state is crucial to bridge the gap
to DataStream API applications for broadcast joining and rule-based
logic.
I would like to propose FLIP-565: Improve ProcessTableFunctions for late
data handling and state access" [2].
This FLIP proposes 3 important PTF improvements:
1) Don’t drop late data in ProcessFunction as data-loss is usually not
intended; similar to DataStream API’s ProcessFunction
2) Introduce ValueView to enable a “supplier”-pattern for state access;
similar to MapView and ListView
3) Introduce BROADCAST_SEMANTIC_TABLE as a new kind of argument to PTFs
Regarding forward compatibility, all proposed items can be made
available in batch mode eventually for a unified experience. From my
point of view, these remaining adjustments should make PTF fully
production ready, I don't expect any major additions in the mid-term.
Looking forward to your feedback.
Thanks,
Timo
[1] https://cwiki.apache.org/confluence/x/pQnPEQ
[2] https://cwiki.apache.org/confluence/x/qIo8G