Hi, Xuyang. Thanks for your response. I just thought an another way to solve this question instead of introducing > a new configuration. When using legacy syntax like `GROUP BY TUMBLE(xxx), > f0`, the rewritten sql can be GROUP BY f0, window_start, > window_end(window_start and window_end is produced by WINDOW TVF). We can > use field index here and use a new Calc node to alias them to avoid field > name conflict in WindowAggregate node.
The solution is much better than before and I think it can solve the problem I mentioned before. What about using config named "table.optimizer.window-rewrite-enabled" +1 IIRC, currently compatibility across middle versions of SQL is not > guaranteed. Should we add constraints on this part? I think we shouldn't remove the operator if we can not give a solution to help users upgrade their jobs. But I think we can delay the discussion until we need to remove the operator. Best, Shengkai Xuyang <xyzhong...@163.com> 于2023年12月8日周五 18:07写道: > Hi, Martijn, thanks for your share. > > >On the topic of syntax for early/late fires, there is existing > >configuration for the legacy group windows: > > > >SET table.exec.emit.early-fire.enabled = true; > >SET table.exec.emit.early-fire.delay = 5s; > >SET table.exec.emit.late-fire.enabled = true; > >SET table.exec.emit.late-fire.delay = 0; > >SET table.exec.emit.allow-lateness = 5s; > > > >We should stick with the syntax for the TVFs, and not modify that. > > Agree with you. We should follow the syntax defined in Flip-145. As for how > to let these options to only take effect on a single window agg instead of > all window aggs, we need to think of another way. > > >On the topic of column naming, for other situations where a user wants > >to use a value that's already reserved, we require the user to include > >backticks to indicate that Flink should not use the reserved keyword > >implementation. Why isn't that sufficient in this case? I rather stay > >consistent with this behaviour instead of introducing new config > >options. > > > > I just thought an another way to solve this question instead of > introducing a new configuration. When using legacy syntax like `GROUP BY > TUMBLE(xxx), f0`, the rewritten sql can be GROUP BY f0, window_start, > window_end(window_start and window_end is produced by WINDOW TVF). We can > use field index here and use a new Calc node to alias them to avoid field > name conflict in WindowAggregate node. > What about this idea, cc @Shengkai Fang ? > > >I would propose to rename the FLIP to > >something like "Add Missing Table Valued Functions Features to Replace > >Legacy Group Window Aggregation". > > IMO, the original title "deprecate" maybe more clear. Because although the > doc has deprecated the legacy Group Window Aggregation syntax just like what > I said in this Flip, but the DEPRECATED tag in doc is attached when doing the > Flip-145. Let's review the content in Flip-145 again. The Flip-145 said "The > existing Grouped window functions, i.e. GROUP BY TUMBLE... are still > supported, but will be deprecated". So as a work to follow Flip-145, this > Flip officially deprecates the legacy window syntax at this time. WDTK? > > > -- > Best! > Xuyang > > > > At 2023-12-07 16:51:44, "Martijn Visser" <martijnvis...@apache.org> wrote: > >Hi Xuyang, > > > >Thanks a lot for starting this discussion. > > > >At first, I was a bit confused because the FLIP talks about > >deprecating the Legacy Group Window Aggregations, but they have > >already been marked as deprecated in the documentation [1]. > >My understanding was that the big challenge was that we don't yet > >support SESSION windows in the Window TVF, and that the other features > >you've mentioned in the discussion threads are additional > >capabilities. > > > >However, when reading up on the actual FLIP (your discussion email > >didn't include a link [2] to it) I now understand the situation. The > >appendix table is actually the most valuable for me, because it gives > >me the overview of the missing capabilities between TVF implementation > >and the Legacy Group Window Aggregations. > > > >On the topic of syntax for early/late fires, there is existing > >configuration for the legacy group windows: > > > >SET table.exec.emit.early-fire.enabled = true; > >SET table.exec.emit.early-fire.delay = 5s; > >SET table.exec.emit.late-fire.enabled = true; > >SET table.exec.emit.late-fire.delay = 0; > >SET table.exec.emit.allow-lateness = 5s; > > > >We should stick with the syntax for the TVFs, and not modify that. > > > >On the topic of column naming, for other situations where a user wants > >to use a value that's already reserved, we require the user to include > >backticks to indicate that Flink should not use the reserved keyword > >implementation. Why isn't that sufficient in this case? I rather stay > >consistent with this behaviour instead of introducing new config > >options. > > > >Overall, +1 on this topic. I would propose to rename the FLIP to > >something like "Add Missing Table Valued Functions Features to Replace > >Legacy Group Window Aggregation". > > > >Best regards, > > > >Martijn > > > >[1] > >https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/window-agg/#group-window-aggregation > >[2] > >https://cwiki.apache.org/confluence/display/FLINK/FLIP-392%3A+Deprecate+the+Legacy+Group+Window+Aggregation > > > > > >On Wed, Dec 6, 2023 at 9:32 AM Xuyang <xyzhong...@163.com> wrote: > >> > >> Hi, Shengkai. Thanks to share your thought. Let me answer related > >> questions。 > >> > >> > >> > Could you give an example about the pass-through column. A session > >> > window may contain multiple rows, which value is selected by the > >> > windowoperator? > >> > >> > >> The table function make the entire inpyt row available in the output. Take > >> an example in Flink, if the input row is <a,b,c> with schema <f0,f1,f2>, > >> the output of the window tvf can also access the <a,b,c> with schema > >> <f0,f1,f2>. The session window always output these multi rows with > >> attached window_start, > >> window_end and window_time column. > >> > >> > >> > What's the behavior if all the data in the window have been removed? > >> > >> > >> Align the behavior of existing group window agg, and do not output data > >> when the window is empty. > >> > >> > >> > Could you explain more details about the how window process CDC? For > >> > example, what's the behavior if the window only gets the DELETE data > >> > from the upstream Operator. > >> > >> > >> Also align the behavior of existing group window agg. However, there is a > >> bug in group window agg that currently group window agg has different > >> results when only > >> consuming -D records while using or not using minibatch. Refs more at > >> FLINK-33760. > >> > >> > >> > The subtitle is not correct here. > >> > >> > >> Updated the doc to fix it. > >> > >> > >> > It's better if we can introduce a syntax like the `emit` keyword to set > >> > the emit strategy for every window. > >> I agree with you. But I don't recommend using this syntax SESSION(data > >> [PARTITION BY (keycols, ...)], DESCRIPTOR(timecol), gap, emit='') because > >> it breaks the syntax in Flip-145. > >> I think using query hint is a better idea. Anyway, this work should belong > >> to another flip. > >> > >> > >> > I think more work should be mentioned in the FLIP. What's the behavior > >> > if the input schema contains a column named `window_start`? > >> > In the current design, `window_start` is a reserved keyword in the > >> > window TVF syntax, but it is legal in the legacy implementation. > >> > >> > >> This is a good question. Perhaps we can introduce a config to add a > >> specific config (named "table.window-additional-columns.prefix") to add a > >> prefix > >> of all window addition columns to solve this situation. For example, user > >> can set the conf to "$", and the additional column from window will become > >> "$window_start", "$window_end" and "$window_time". WDYT? > >> > >> > >> > In the FLIP, you mention the FLIP should introduce an option to fall > >> > back to the legacy behavior. Could you tell us what's the name of the > >> > option? > >> > BTW, I think we should unify the implementation when window TVF can do > >> > all the work that the legacy operator can do and there is no need to > >> > introduce an option to fallback. > >> > >> > >> What about using config named "table.optimizer.window-rewrite-enabled"? If > >> I agree with you that util all features are aligned and everything is ok > >> about window tvf, we should also remove this config about fallback. > >> But I think to be on the safe side, we can observe one or two versions of > >> this rewrite, and allows users to roll back when problems arise. > >> > >> > >> > If we remove the legacy window operator in the future, how users upgrade > >> > their jobs? Do you have any plan to support state migration from the > >> > legacy window to Windows TVF? > >> IIRC, currently compatibility across middle versions of SQL is not > >> guaranteed. Should we add constraints on this part? > >> > >> > >> Look for your feedback! > >> > >> > >> > >> > >> > >> > >> > >> -- > >> > >> Best! > >> Xuyang > >> > >> > >> > >> > >> > >> At 2023-12-05 12:06:27, "liu ron" <ron9....@gmail.com> wrote: > >> >Hi, xuyang > >> > > >> >Thanks for starting this FLIP discussion, currently there are two types of > >> >window aggregation in Flink SQL, namely legacy group window aggregation > >> >and > >> >window tvf aggregation, these two types of window aggregation are not > >> >fully > >> >aligned in behavior, which will bring a lot of confusion to the users, so > >> >there is a need to unify and align them. I think the final ideal state > >> >should be that there is only one window tvf aggregation, which supports > >> >Tumble, HOP, Cumulate and Session windows, and supports consuming CDC data > >> >streams. There is also support for configuring EARLY-FIRE and LATER-FIRE. > >> > > >> >This FLIP is a continuation of FLIP-145, and also supports legacy group > >> >window aggregation to flat-migrate to the new window tvf agregation, which > >> >is very useful, especially for the support of CDC streams, a pain point > >> >that users often feedback. Big +1 for this FLIP. > >> > > >> >Best, > >> >Ron > >> > > >> >Xuyang <xyzhong...@163.com> 于2023年12月5日周二 11:11写道: > >> > > >> >> Hi, Feng and David. > >> >> > >> >> > >> >> Thank you very much to share your thoughts. > >> >> > >> >> > >> >> This flip does not include the official exposure of these experimental > >> >> conf to users. Thus there is not adetailed description of this part. > >> >> However, in view that some technical users may have added these > >> >> experimental conf in actual production jobs, the processing > >> >> of these conf while using window tvf syntax has been added to this flip. > >> >> > >> >> > >> >> Overall, the behavior of using these experimental parameters is no > >> >> different from before, and I think we should provide the compatibility > >> >> about using these experimental conf. > >> >> > >> >> > >> >> Look for your thoughs. > >> >> > >> >> > >> >> > >> >> > >> >> -- > >> >> > >> >> Best! > >> >> Xuyang > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> At 2023-12-05 09:17:49, "David Anderson" <dander...@apache.org> wrote: > >> >> >The current situation (where we have both the legacy windows and the > >> >> >TVF-based windows) is confusing for users, and I'd like to see us move > >> >> >forward as rapidly as possible. > >> >> > > >> >> >Since the early fire, late fire, and allowed lateness features were > >> >> >never > >> >> >documented or exposed to users, I don't feel that we need to provide > >> >> >replacements for these internal, experimental features before > >> >> >officially > >> >> >deprecating the legacy group window aggregations, and I'd rather not > >> >> >wait. > >> >> > > >> >> >However, I'd be delighted to see a proposal for what that might look > >> >> >like. > >> >> > > >> >> >Best, > >> >> >David > >> >> > > >> >> >On Mon, Dec 4, 2023 at 12:45 PM Feng Jin <jinfeng1...@gmail.com> wrote: > >> >> > > >> >> >> Hi xuyang, > >> >> >> > >> >> >> Thank you for initiating this proposal. > >> >> >> > >> >> >> I'm glad to see that TVF's functionality can be fully supported. > >> >> >> > >> >> >> Regarding the early fire, late fire, and allow lateness features, how > >> >> will > >> >> >> they be provided to users? The documentation doesn't seem to provide > >> >> >> a > >> >> >> detailed description of this part. > >> >> >> > >> >> >> Since this FLIP will also involve a lot of feature development, I am > >> >> more > >> >> >> than willing to help, including development and code review. > >> >> >> > >> >> >> Best, > >> >> >> Feng > >> >> >> > >> >> >> On Tue, Nov 28, 2023 at 8:31 PM Xuyang <xyzhong...@163.com> wrote: > >> >> >> > >> >> >> > Hi all. > >> >> >> > I'd like to start a discussion of FLIP-392: Deprecate the Legacy > >> >> >> > Group > >> >> >> > Window Aggregation. > >> >> >> > > >> >> >> > > >> >> >> > Although the current Flink SQL Window Aggregation documentation[1] > >> >> >> > indicates that the legacy Group Window Aggregation > >> >> >> > syntax has been deprecated, the new Window TVF Aggregation syntax > >> >> >> > has > >> >> not > >> >> >> > fully covered all of the features of the legacy one. > >> >> >> > > >> >> >> > > >> >> >> > Compared to Group Window Aggergation, Window TVF Aggergation has > >> >> several > >> >> >> > advantages, such as two-stage optimization, > >> >> >> > support for standard GROUPING SET syntax, and so on. However, it > >> >> needs to > >> >> >> > supplement and enrich the following features. > >> >> >> > > >> >> >> > > >> >> >> > 1. Support for SESSION Window TVF Aggregation > >> >> >> > 2. Support for consuming CDC stream > >> >> >> > 3. Support for HOP window size with non-integer step length > >> >> >> > 4. Support for configurations such as early fire, late fire and > >> >> >> > allow > >> >> >> > lateness > >> >> >> > (which are internal experimental configurations in Group Window > >> >> >> > Aggregation and not public to users yet.) > >> >> >> > 5. Unification of the Window TVF Aggregation operator in runtime at > >> >> the > >> >> >> > implementation layer > >> >> >> > (In the long term, the cost to maintain the operators about Window > >> >> >> > TVF > >> >> >> > Aggregation and Group Window Aggregation is too expensive.) > >> >> >> > > >> >> >> > > >> >> >> > This flip aims to continue the unfinished work in FLIP-145[2], > >> >> >> > which > >> >> is > >> >> >> to > >> >> >> > fully enable the capabilities of Window TVF Aggregation > >> >> >> > and officially deprecate the legacy syntax Group Window > >> >> >> > Aggregation, > >> >> to > >> >> >> > prepare for the removal of the legacy one in Flink 2.0. > >> >> >> > > >> >> >> > > >> >> >> > I have already done some preliminary POC to validate the > >> >> >> > feasibility > >> >> of > >> >> >> > the related work in this flip as follows. > >> >> >> > 1. POC for SESSION Window TVF Aggregation [3] > >> >> >> > 2. POC for CUMULATE in Group Window Aggregation operator [4] > >> >> >> > 3. POC for consuming CDC stream in Window Aggregation operator [5] > >> >> >> > > >> >> >> > > >> >> >> > Looking forward to your feedback and thoughts! > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > [1] > >> >> >> > > >> >> >> > >> >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-agg/ > >> >> >> > > >> >> >> > [2] > >> >> >> > > >> >> >> > >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows > >> >> >> > [3] https://github.com/xuyangzhong/flink/tree/FLINK-24024 > >> >> >> > [4] > >> >> >> > > >> >> >> > >> >> https://github.com/xuyangzhong/flink/tree/poc_legacy_group_window_agg_cumulate > >> >> >> > [5] > >> >> >> > > >> >> >> > >> >> https://github.com/xuyangzhong/flink/tree/poc_window_agg_consumes_cdc_stream > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > -- > >> >> >> > > >> >> >> > Best! > >> >> >> > Xuyang > >> >> >> > >> >> > >