Hi Timo, Thank you for proposing these improvements. All address real pain points, so +1. It's especially good to see BROADCAST_SEMANTIC_TABLE. This unlocks a set of use cases for use cases involving small lookup tables that can be considerably optimized. I'm also +1 on supporting ORDER BY instead of an additional argument trait.
Thanks for continuing to push PTFs forward - they are becoming really powerful. Kind regards, Gustavo On Wed, 4 Mar 2026 at 16:40, Ryan van Huuksloot via dev < [email protected]> wrote: > That makes sense to me. First make it work; then, make it easy. > > Otherwise the FLIP looks good to me. Some great improvements! Thanks for > putting this together. > > Ryan van Huuksloot > Staff Engineer, Infrastructure | Streaming Platform > [image: Shopify] > <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email> > > > On Wed, Mar 4, 2026 at 9:22 AM Timo Walther <[email protected]> wrote: > > > Hi Ryan, > > > > thanks for the great feedback. I agree that some parts might still be > > too complex, usability is definitely a continuous effort. For now, the > > main goal of PTFs was to unblock people when something cannot be > > expressed with SQL or would lead to very inefficient query plans. Also > > they rather target a developer persona. Usually, a platform team that > > develops PTFs for SQL personas. In the mid-term, I hope that AI will > > implement most of the PTFs. So exposing engine primitives / building > > blocks for AI is crucial. > > > > Maybe we can also offer a SimpleProcessFunction at some point, once we > > know better why and how people use PTFs. Also having more built-in PTFs > > that address the most frequent tasks can be very helpful. > > > > Please continue sharing your experiences: What are frequent tasks? What > > do users want to achieve with PTFs? > > > > Cheers, > > Timo > > > > On 03.03.26 21:09, Ryan van Huuksloot via dev wrote: > > > Hi Timo, > > > > > > Thanks for the FLIP. > > > > > > Internally, we've started using PTFs and are still figuring out how to > > best > > > leverage them. > > > The improvements you proposed in your FLIP are great. > > > I wanted to mention the priority order for the 3 improvements you've > > > recommended. I would prioritize them in the order you stated, based on > > our > > > usage. So far I haven't had any broadcast requests but I'm sure they're > > > coming. The late arriving data will be very helpful. > > > > > > My primary concern with PTFs and large state is generally the > complexity > > of > > > the state decisions. Most of our SQL developers won't understand when > to > > > use a "[Map][List][Value]View" with a PTF. Specifically this area in > the > > > documentation: > > > > > > https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/dev/table/functions/ptfs/#large-state > > > You really need to understand Java concepts to grasp the intricacies of > > > your decisions when choosing a state mechanism. I wonder if we can > > simplify > > > this decision for engineers who may not be Flink and Java experts. It > may > > > not be possible. > > > > > > Ryan van Huuksloot > > > Staff Engineer, Infrastructure | Streaming Platform > > > [image: Shopify] > > > < > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email > > > > > > > > > > > > On Tue, Mar 3, 2026 at 3:47 AM Timo Walther <[email protected]> > wrote: > > > > > >> Hi everyone, > > >> > > >> Just bumping this thread again and happy to gather any feedback you > > have. > > >> > > >> Thanks, > > >> Timo > > >> > > >> On 16.02.26 09:35, Timo Walther wrote: > > >>> Hi everyone, > > >>> > > >>> the ProcessTableFunction (PTF) feature has been well received by the > > >>> Flink community and its adoption is increasing. Since FLIP-440 [1] > > >>> introduced a lot of new API and new concepts, some design decisions > > need > > >>> smaller adjustments along late data handling and lazy state access. > > >>> > > >>> Also, talking to community members at Current and Flink Forward > > >>> conferences has shown that broadcast state is crucial to bridge the > gap > > >>> to DataStream API applications for broadcast joining and rule-based > > >> logic. > > >>> > > >>> I would like to propose FLIP-565: Improve ProcessTableFunctions for > > late > > >>> data handling and state access" [2]. > > >>> > > >>> This FLIP proposes 3 important PTF improvements: > > >>> > > >>> 1) Don’t drop late data in ProcessFunction as data-loss is usually > not > > >>> intended; similar to DataStream API’s ProcessFunction > > >>> > > >>> 2) Introduce ValueView to enable a “supplier”-pattern for state > access; > > >>> similar to MapView and ListView > > >>> > > >>> 3) Introduce BROADCAST_SEMANTIC_TABLE as a new kind of argument to > PTFs > > >>> > > >>> Regarding forward compatibility, all proposed items can be made > > >>> available in batch mode eventually for a unified experience. From my > > >>> point of view, these remaining adjustments should make PTF fully > > >>> production ready, I don't expect any major additions in the mid-term. > > >>> > > >>> Looking forward to your feedback. > > >>> > > >>> Thanks, > > >>> Timo > > >>> > > >>> [1] https://cwiki.apache.org/confluence/x/pQnPEQ > > >>> [2] https://cwiki.apache.org/confluence/x/qIo8G > > >>> > > >> > > >> > > > > > > > >
