Hi, Xuyang. Thanks for your response.

I just thought an another way to solve this question instead of introducing
> a new configuration. When using legacy syntax like `GROUP BY TUMBLE(xxx),
> f0`, the rewritten sql can be GROUP BY f0, window_start,
> window_end(window_start and window_end is produced by WINDOW TVF). We can
> use field index here and use a new Calc node to alias them to avoid field
> name conflict in WindowAggregate node.


The solution is much better than before and I think it can solve the
problem I mentioned before.

What about using config named "table.optimizer.window-rewrite-enabled"


+1

IIRC, currently compatibility across middle versions of SQL is not
> guaranteed. Should we add constraints on this part?


I think we shouldn't remove the operator if we can not give a solution to
help users upgrade their jobs. But I think we can delay the discussion
until we need to remove the operator.

Best,
Shengkai

Xuyang <xyzhong...@163.com> 于2023年12月8日周五 18:07写道:

> Hi, Martijn, thanks for your share.
>
> >On the topic of syntax for early/late fires, there is existing
> >configuration for the legacy group windows:
> >
> >SET table.exec.emit.early-fire.enabled = true;
> >SET table.exec.emit.early-fire.delay = 5s;
> >SET table.exec.emit.late-fire.enabled = true;
> >SET table.exec.emit.late-fire.delay = 0;
> >SET table.exec.emit.allow-lateness = 5s;
> >
> >We should stick with the syntax for the TVFs, and not modify that.
>
> Agree with you. We should follow the syntax defined in Flip-145. As for how 
> to let these options to only take effect on a single window agg instead of 
> all window aggs, we need to think of another way.
>
> >On the topic of column naming, for other situations where a user wants
> >to use a value that's already reserved, we require the user to include
> >backticks to indicate that Flink should not use the reserved keyword
> >implementation. Why isn't that sufficient in this case? I rather stay
> >consistent with this behaviour instead of introducing new config
> >options.
> >
>
> I just thought an another way to solve this question instead of
> introducing a new configuration. When using legacy syntax like `GROUP BY
> TUMBLE(xxx), f0`, the rewritten sql can be GROUP BY f0, window_start,
> window_end(window_start and window_end is produced by WINDOW TVF). We can
> use field index here and use a new Calc node to alias them to avoid field
> name conflict in WindowAggregate node.
> What about this idea, cc  @Shengkai Fang ?
>
> >I would propose to rename the FLIP to
> >something like "Add Missing Table Valued Functions Features to Replace
> >Legacy Group Window Aggregation".
>
> IMO, the original title "deprecate" maybe more clear. Because although the 
> doc has deprecated the legacy Group Window Aggregation syntax just like what 
> I said in this Flip, but the DEPRECATED tag in doc is attached when doing the 
> Flip-145. Let's review the content in Flip-145 again. The Flip-145 said "The 
> existing Grouped window functions, i.e. GROUP BY TUMBLE... are still 
> supported, but will be deprecated". So as a work to follow Flip-145, this 
> Flip officially deprecates the legacy window syntax at this time. WDTK?
>
>
> --
>     Best!
>     Xuyang
>
>
>
> At 2023-12-07 16:51:44, "Martijn Visser" <martijnvis...@apache.org> wrote:
> >Hi Xuyang,
> >
> >Thanks a lot for starting this discussion.
> >
> >At first, I was a bit confused because the FLIP talks about
> >deprecating the Legacy Group Window Aggregations, but they have
> >already been marked as deprecated in the documentation [1].
> >My understanding was that the big challenge was that we don't yet
> >support SESSION windows in the Window TVF, and that the other features
> >you've mentioned in the discussion threads are additional
> >capabilities.
> >
> >However, when reading up on the actual FLIP (your discussion email
> >didn't include a link [2] to it) I now understand the situation. The
> >appendix table is actually the most valuable for me, because it gives
> >me the overview of the missing capabilities between TVF implementation
> >and the Legacy Group Window Aggregations.
> >
> >On the topic of syntax for early/late fires, there is existing
> >configuration for the legacy group windows:
> >
> >SET table.exec.emit.early-fire.enabled = true;
> >SET table.exec.emit.early-fire.delay = 5s;
> >SET table.exec.emit.late-fire.enabled = true;
> >SET table.exec.emit.late-fire.delay = 0;
> >SET table.exec.emit.allow-lateness = 5s;
> >
> >We should stick with the syntax for the TVFs, and not modify that.
> >
> >On the topic of column naming, for other situations where a user wants
> >to use a value that's already reserved, we require the user to include
> >backticks to indicate that Flink should not use the reserved keyword
> >implementation. Why isn't that sufficient in this case? I rather stay
> >consistent with this behaviour instead of introducing new config
> >options.
> >
> >Overall, +1 on this topic. I would propose to rename the FLIP to
> >something like "Add Missing Table Valued Functions Features to Replace
> >Legacy Group Window Aggregation".
> >
> >Best regards,
> >
> >Martijn
> >
> >[1] 
> >https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/window-agg/#group-window-aggregation
> >[2] 
> >https://cwiki.apache.org/confluence/display/FLINK/FLIP-392%3A+Deprecate+the+Legacy+Group+Window+Aggregation
> >
> >
> >On Wed, Dec 6, 2023 at 9:32 AM Xuyang <xyzhong...@163.com> wrote:
> >>
> >> Hi, Shengkai. Thanks to share your thought. Let me answer related 
> >> questions。
> >>
> >>
> >> > Could you give an example about the pass-through column. A session 
> >> > window may contain multiple rows, which value is selected by the 
> >> > windowoperator?
> >>
> >>
> >> The table function make the entire inpyt row available in the output. Take 
> >> an example in Flink, if the input row is <a,b,c> with schema <f0,f1,f2>,
> >> the output of the window tvf can also access the <a,b,c> with schema 
> >> <f0,f1,f2>. The session window always output these multi rows with 
> >> attached window_start,
> >> window_end and window_time column.
> >>
> >>
> >> > What's the behavior if all the data in the window have been removed?
> >>
> >>
> >> Align the behavior of existing group window agg, and do not output data 
> >> when the window is empty.
> >>
> >>
> >> > Could you explain more details about the how window process CDC? For 
> >> > example, what's the behavior if the window only gets the DELETE data 
> >> > from the upstream Operator.
> >>
> >>
> >> Also align the behavior of existing group window agg. However, there is a 
> >> bug in group window agg that currently group window agg has different 
> >> results when only
> >> consuming -D records while using or not using minibatch. Refs more at 
> >> FLINK-33760.
> >>
> >>
> >> > The subtitle is not correct here.
> >>
> >>
> >> Updated the doc to fix it.
> >>
> >>
> >> > It's better if we can introduce a syntax like the `emit` keyword to set 
> >> > the emit strategy for every window.
> >> I agree with you. But I don't recommend using this syntax SESSION(data 
> >> [PARTITION BY (keycols, ...)], DESCRIPTOR(timecol), gap, emit='') because 
> >> it breaks the syntax in Flip-145.
> >> I think using query hint is a better idea. Anyway, this work should belong 
> >> to another flip.
> >>
> >>
> >> > I think more work should be mentioned in the FLIP. What's the behavior 
> >> > if the input schema contains a column named `window_start`?
> >> > In the current design, `window_start` is a reserved keyword in the 
> >> > window TVF syntax, but it is legal in the legacy implementation.
> >>
> >>
> >> This is a good question. Perhaps we can introduce a config to add a 
> >> specific config (named "table.window-additional-columns.prefix") to add a 
> >> prefix
> >> of all window addition columns to solve this situation. For example, user 
> >> can set the conf to "$", and the additional column from window will become
> >> "$window_start", "$window_end" and "$window_time". WDYT?
> >>
> >>
> >> > In the FLIP, you mention the FLIP should introduce an option to fall 
> >> > back to the legacy behavior. Could you tell us what's the name of the 
> >> > option?
> >> > BTW, I think we should unify the implementation when window TVF can do 
> >> > all the work that the legacy operator can do and there is no need to 
> >> > introduce an option to fallback.
> >>
> >>
> >> What about using config named "table.optimizer.window-rewrite-enabled"? If
> >> I agree with you that util all features are aligned and everything is ok 
> >> about window tvf, we should also remove this config about fallback.
> >> But I think to be on the safe side, we can observe one or two versions of 
> >> this rewrite, and allows users to roll back when problems arise.
> >>
> >>
> >> > If we remove the legacy window operator in the future, how users upgrade 
> >> > their jobs? Do you have any plan to support state migration from the 
> >> > legacy window to Windows TVF?
> >> IIRC, currently compatibility across middle versions of SQL is not 
> >> guaranteed. Should we add constraints on this part?
> >>
> >>
> >> Look for your feedback!
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >>     Best!
> >>     Xuyang
> >>
> >>
> >>
> >>
> >>
> >> At 2023-12-05 12:06:27, "liu ron" <ron9....@gmail.com> wrote:
> >> >Hi, xuyang
> >> >
> >> >Thanks for starting this FLIP discussion, currently there are two types of
> >> >window aggregation in Flink SQL, namely legacy group window aggregation 
> >> >and
> >> >window tvf aggregation, these two types of window aggregation are not 
> >> >fully
> >> >aligned in behavior, which will bring a lot of confusion to the users, so
> >> >there is a need to unify and align them. I think the final ideal state
> >> >should be that there is only one window tvf aggregation, which supports
> >> >Tumble, HOP, Cumulate and Session windows, and supports consuming CDC data
> >> >streams. There is also support for configuring EARLY-FIRE and LATER-FIRE.
> >> >
> >> >This FLIP is a continuation of FLIP-145, and also supports legacy group
> >> >window aggregation to flat-migrate to the new window tvf agregation, which
> >> >is very useful, especially for the support of CDC streams, a pain point
> >> >that users often feedback. Big +1 for this FLIP.
> >> >
> >> >Best,
> >> >Ron
> >> >
> >> >Xuyang <xyzhong...@163.com> 于2023年12月5日周二 11:11写道:
> >> >
> >> >> Hi, Feng and David.
> >> >>
> >> >>
> >> >> Thank you very much to share your thoughts.
> >> >>
> >> >>
> >> >> This flip does not include the official exposure of these experimental
> >> >> conf to users. Thus there is not adetailed description of this part.
> >> >> However, in view that some technical users may have added these
> >> >> experimental conf in actual production jobs, the processing
> >> >> of these conf while using window tvf syntax has been added to this flip.
> >> >>
> >> >>
> >> >> Overall, the behavior of using these experimental parameters is no
> >> >> different from before, and I think we should provide the compatibility
> >> >> about using these experimental conf.
> >> >>
> >> >>
> >> >> Look for your thoughs.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >>     Best!
> >> >>     Xuyang
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> At 2023-12-05 09:17:49, "David Anderson" <dander...@apache.org> wrote:
> >> >> >The current situation (where we have both the legacy windows and the
> >> >> >TVF-based windows) is confusing for users, and I'd like to see us move
> >> >> >forward as rapidly as possible.
> >> >> >
> >> >> >Since the early fire, late fire, and allowed lateness features were 
> >> >> >never
> >> >> >documented or exposed to users, I don't feel that we need to provide
> >> >> >replacements for these internal, experimental features before 
> >> >> >officially
> >> >> >deprecating the legacy group window aggregations, and I'd rather not 
> >> >> >wait.
> >> >> >
> >> >> >However, I'd be delighted to see a proposal for what that might look 
> >> >> >like.
> >> >> >
> >> >> >Best,
> >> >> >David
> >> >> >
> >> >> >On Mon, Dec 4, 2023 at 12:45 PM Feng Jin <jinfeng1...@gmail.com> wrote:
> >> >> >
> >> >> >> Hi xuyang,
> >> >> >>
> >> >> >> Thank you for initiating this proposal.
> >> >> >>
> >> >> >> I'm glad to see that TVF's functionality can be fully supported.
> >> >> >>
> >> >> >> Regarding the early fire, late fire, and allow lateness features, how
> >> >> will
> >> >> >> they be provided to users? The documentation doesn't seem to provide 
> >> >> >> a
> >> >> >> detailed description of this part.
> >> >> >>
> >> >> >> Since this FLIP will also involve a lot of feature development, I am
> >> >> more
> >> >> >> than willing to help, including development and code review.
> >> >> >>
> >> >> >> Best,
> >> >> >> Feng
> >> >> >>
> >> >> >> On Tue, Nov 28, 2023 at 8:31 PM Xuyang <xyzhong...@163.com> wrote:
> >> >> >>
> >> >> >> > Hi all.
> >> >> >> > I'd like to start a discussion of FLIP-392: Deprecate the Legacy 
> >> >> >> > Group
> >> >> >> > Window Aggregation.
> >> >> >> >
> >> >> >> >
> >> >> >> > Although the current Flink SQL Window Aggregation documentation[1]
> >> >> >> > indicates that the legacy Group Window Aggregation
> >> >> >> > syntax has been deprecated, the new Window TVF Aggregation syntax 
> >> >> >> > has
> >> >> not
> >> >> >> > fully covered all of the features of the legacy one.
> >> >> >> >
> >> >> >> >
> >> >> >> > Compared to Group Window Aggergation, Window TVF Aggergation has
> >> >> several
> >> >> >> > advantages, such as two-stage optimization,
> >> >> >> > support for standard GROUPING SET syntax, and so on. However, it
> >> >> needs to
> >> >> >> > supplement and enrich the following features.
> >> >> >> >
> >> >> >> >
> >> >> >> > 1. Support for SESSION Window TVF Aggregation
> >> >> >> > 2. Support for consuming CDC stream
> >> >> >> > 3. Support for HOP window size with non-integer step length
> >> >> >> > 4. Support for configurations such as early fire, late fire and 
> >> >> >> > allow
> >> >> >> > lateness
> >> >> >> > (which are internal experimental configurations in Group Window
> >> >> >> > Aggregation and not public to users yet.)
> >> >> >> > 5. Unification of the Window TVF Aggregation operator in runtime at
> >> >> the
> >> >> >> > implementation layer
> >> >> >> > (In the long term, the cost to maintain the operators about Window 
> >> >> >> > TVF
> >> >> >> > Aggregation and Group Window Aggregation is too expensive.)
> >> >> >> >
> >> >> >> >
> >> >> >> > This flip aims to continue the unfinished work in FLIP-145[2], 
> >> >> >> > which
> >> >> is
> >> >> >> to
> >> >> >> > fully enable the capabilities of Window TVF Aggregation
> >> >> >> >  and officially deprecate the legacy syntax Group Window 
> >> >> >> > Aggregation,
> >> >> to
> >> >> >> > prepare for the removal of the legacy one in Flink 2.0.
> >> >> >> >
> >> >> >> >
> >> >> >> > I have already done some preliminary POC to validate the 
> >> >> >> > feasibility
> >> >> of
> >> >> >> > the related work in this flip as follows.
> >> >> >> > 1. POC for SESSION Window TVF Aggregation [3]
> >> >> >> > 2. POC for CUMULATE in Group Window Aggregation operator [4]
> >> >> >> > 3. POC for consuming CDC stream in Window Aggregation operator [5]
> >> >> >> >
> >> >> >> >
> >> >> >> > Looking forward to your feedback and thoughts!
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > [1]
> >> >> >> >
> >> >> >>
> >> >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-agg/
> >> >> >> >
> >> >> >> > [2]
> >> >> >> >
> >> >> >>
> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows
> >> >> >> > [3] https://github.com/xuyangzhong/flink/tree/FLINK-24024
> >> >> >> > [4]
> >> >> >> >
> >> >> >>
> >> >> https://github.com/xuyangzhong/flink/tree/poc_legacy_group_window_agg_cumulate
> >> >> >> > [5]
> >> >> >> >
> >> >> >>
> >> >> https://github.com/xuyangzhong/flink/tree/poc_window_agg_consumes_cdc_stream
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> >
> >> >> >> >     Best!
> >> >> >> >     Xuyang
> >> >> >>
> >> >>
>
>

Reply via email to