Hi everyone,
if there are not objections, I would start a VOTE on Monday.
Thanks,
Timo
On 05.03.26 10:02, Gustavo de Morais wrote:
Hi Timo,
Thank you for proposing these improvements. All address real pain points,
so +1. It's especially good to see BROADCAST_SEMANTIC_TABLE. This unlocks
a set of use cases for use cases involving small lookup tables that can be
considerably optimized. I'm also +1 on supporting ORDER BY instead of an
additional argument trait.
Thanks for continuing to push PTFs forward - they are becoming really
powerful.
Kind regards,
Gustavo
On Wed, 4 Mar 2026 at 16:40, Ryan van Huuksloot via dev <
[email protected]> wrote:
That makes sense to me. First make it work; then, make it easy.
Otherwise the FLIP looks good to me. Some great improvements! Thanks for
putting this together.
Ryan van Huuksloot
Staff Engineer, Infrastructure | Streaming Platform
[image: Shopify]
<https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
On Wed, Mar 4, 2026 at 9:22 AM Timo Walther <[email protected]> wrote:
Hi Ryan,
thanks for the great feedback. I agree that some parts might still be
too complex, usability is definitely a continuous effort. For now, the
main goal of PTFs was to unblock people when something cannot be
expressed with SQL or would lead to very inefficient query plans. Also
they rather target a developer persona. Usually, a platform team that
develops PTFs for SQL personas. In the mid-term, I hope that AI will
implement most of the PTFs. So exposing engine primitives / building
blocks for AI is crucial.
Maybe we can also offer a SimpleProcessFunction at some point, once we
know better why and how people use PTFs. Also having more built-in PTFs
that address the most frequent tasks can be very helpful.
Please continue sharing your experiences: What are frequent tasks? What
do users want to achieve with PTFs?
Cheers,
Timo
On 03.03.26 21:09, Ryan van Huuksloot via dev wrote:
Hi Timo,
Thanks for the FLIP.
Internally, we've started using PTFs and are still figuring out how to
best
leverage them.
The improvements you proposed in your FLIP are great.
I wanted to mention the priority order for the 3 improvements you've
recommended. I would prioritize them in the order you stated, based on
our
usage. So far I haven't had any broadcast requests but I'm sure they're
coming. The late arriving data will be very helpful.
My primary concern with PTFs and large state is generally the
complexity
of
the state decisions. Most of our SQL developers won't understand when
to
use a "[Map][List][Value]View" with a PTF. Specifically this area in
the
documentation:
https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/dev/table/functions/ptfs/#large-state
You really need to understand Java concepts to grasp the intricacies of
your decisions when choosing a state mechanism. I wonder if we can
simplify
this decision for engineers who may not be Flink and Java experts. It
may
not be possible.
Ryan van Huuksloot
Staff Engineer, Infrastructure | Streaming Platform
[image: Shopify]
<
https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
On Tue, Mar 3, 2026 at 3:47 AM Timo Walther <[email protected]>
wrote:
Hi everyone,
Just bumping this thread again and happy to gather any feedback you
have.
Thanks,
Timo
On 16.02.26 09:35, Timo Walther wrote:
Hi everyone,
the ProcessTableFunction (PTF) feature has been well received by the
Flink community and its adoption is increasing. Since FLIP-440 [1]
introduced a lot of new API and new concepts, some design decisions
need
smaller adjustments along late data handling and lazy state access.
Also, talking to community members at Current and Flink Forward
conferences has shown that broadcast state is crucial to bridge the
gap
to DataStream API applications for broadcast joining and rule-based
logic.
I would like to propose FLIP-565: Improve ProcessTableFunctions for
late
data handling and state access" [2].
This FLIP proposes 3 important PTF improvements:
1) Don’t drop late data in ProcessFunction as data-loss is usually
not
intended; similar to DataStream API’s ProcessFunction
2) Introduce ValueView to enable a “supplier”-pattern for state
access;
similar to MapView and ListView
3) Introduce BROADCAST_SEMANTIC_TABLE as a new kind of argument to
PTFs
Regarding forward compatibility, all proposed items can be made
available in batch mode eventually for a unified experience. From my
point of view, these remaining adjustments should make PTF fully
production ready, I don't expect any major additions in the mid-term.
Looking forward to your feedback.
Thanks,
Timo
[1] https://cwiki.apache.org/confluence/x/pQnPEQ
[2] https://cwiki.apache.org/confluence/x/qIo8G