Hi Jark,
2 & 3 sounds good to me.
Regarding time attribute,
I still have some questions, I knew it's easy to support cascaded window
aggregate using new TVFs.
However there are some other places where need time attribute:
- CEP
- interval join
- order by
- over window
If there is no time attribute column, how do we integrate these old
features with the new TVFs.
E.g.
StreamA -> new window aggregate -> interval join -> Sink
/
StreamB -----------------------------------
Jark Wu <[email protected]> 于2020年10月9日周五 下午11:51写道:
> Hi Benchao,
>
> 1) time attribute
> Yes. We don't need time attribute auxiliary function. Because the new
> window operations are all based on the
> window_start and window_end columns instead of on the time attributes. So
> we don't need to propagate time attributes.
> Cascaded window aggregate can be expressed by simply GROUP BY the
> window_start and window_end of the previous window result.
> I have added a cascaded window aggregate example in the Tumbling Window
> section in the FLIP.
> If you want to define proctime window aggregate, the time column in TVF
> should be a proctime attribute field (or PROCTIME() function).
>
> 2) batch support
> Yes. The proposed syntax/API are unified for batch and streaming. Batch
> support is in the plan, but may not have enough time to catch up 1.12.
>
> 3) support `grouping sets`
> This is not included in the FLIP, but I think it's great if we can support
> `grouping sets`.
> The existing window impl doesn't support this because we convert the
> LogicalAggregate into WindowAggregate in the beginning,
> the expand grouping sets rule can't be applied in this situation.
> Fortunately, with the new window impl, the conversion to WindowAggregate
> will happen at the end, so I think the expand rule can be
> applied and support this feature naturally.
> Therefore, IMO, we don't need to include this feature in this FLIP to avoid
> the FLIP being too large.
> This can be a follow-up issue (maybe just add tests and docs) after the
> FLIP.
>
> Best,
> Jark
>
>
> On Fri, 9 Oct 2020 at 19:09, 刘 芃成 <[email protected]> wrote:
>
> > Hi,Benchao,
> > Welcome to join the discussion, yes, this new syntax can make SQL
> > more clear and simpler.
> > For your first question, the `window_start` and `window_end`
> > columns will be added automatically,
> > so we don't need to use auxiliary group functions to infer or
> > access the window properties.
> >
> > For the `grouping sets` on TVFs, I think it's interesting if we
> > can support it, as we already supported `grouping sets`
> > on streaming aggregates in blink planner. But I'm not sure if it
> > will be included into this FLIP.
> >
> > cc @Jark Wu
> >
> > Best,
> > Pengcheng
> >
> >
> > 在 2020/10/9 下午5:25,“Benchao Li”<[email protected]> 写入:
> >
> > Thanks Jark for bringing this discussion, I like this FLIP very much.
> >
> > Especially the cumulate window, it's much like the current TUMBLE
> > window +
> > Fast Emit (which is an undocumented experimental feature), however,
> > it's
> > more powerful.
> >
> > And This will make the SQL semantic more standard, especially for the
> > HOPPING window.
> >
> > Regarding time attribute,
> > It seems that we don't need a specific function to infer the time
> > attribute
> > like
> > `TUMBLE_ROWTIME` / `TUMBLE_PROCTIME`. Then are `window_start` and
> > `window_end`
> > column a time attribute column automatically?
> > - If not, what will be the time attribute of the result relation of
> > these
> > TVFs?
> > Especially after the window aggregation.
> > - If yes, then how do we handle proctime?
> >
> > Regarding batch operators,
> > It's great to hear that we can reuse the batch operators in
> continuous
> > batch mode
> > as you mentioned in the FLIP.
> > Current window aggregate could also be used in batch mode with
> > rowtime. Do
> > you plan
> > to support these TVFs for batch mode in this FLIP? Hence the
> Table/SQL
> > is a
> > unified
> > API, it's great if we can keep the features complete both in
> streaming
> > and
> > batch mode.
> >
> > There is one more question, I don't know whether it should be
> > considered in
> > this FLIP.
> > Does the new window support `grouping sets`? (It's not supported in
> old
> > window impl).
> >
> > Jark Wu <[email protected]> 于2020年10月9日周五 下午4:14写道:
> >
> > > Hi all,
> > >
> > > I know we have a lot of discussion and development on going right
> > now but
> > > it would be great if we can get FLIP-145 into a votable state.
> > > If there are no objections, I would like to start voting in the
> next
> > days.
> > >
> > > Best,
> > > Jark
> > >
> > > On Thu, 1 Oct 2020 at 14:29, Jark Wu <[email protected]> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I have added a section for Performance Optimization to describe
> > how to
> > > > improve the performance in the short-term and long-term
> > > > and sketch the future performance potential under the new window
> > API.
> > > > Introducing the window API is just the first step, we will
> > > > continuously improve the performance to make it powerful and
> > useful.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Thu, 1 Oct 2020 at 14:28, Jark Wu <[email protected]> wrote:
> > > >
> > > >> Hi Pengcheng,
> > > >>
> > > >> Yes, the window TVF is part of the FLIP. Welcome to contribute
> > and join
> > > >> the discussion.
> > > >> Regarding the SESSION window aggregation, users can use the
> > existing
> > > >> grouped session window function.
> > > >>
> > > >> Best,
> > > >> Jark
> > > >>
> > > >> On Sun, 27 Sep 2020 at 21:24, liupengcheng <
> > [email protected]
> > > >
> > > >> wrote:
> > > >>
> > > >>> Hi Jark,
> > > >>> Thanks for reply, yes, I think it's a good feature, it
> > can
> > > >>> improve the NRT scenarios
> > > >>> as you mentioned in the FLIP. Also, I think it can
> > improve the
> > > >>> streaming SQL greatly,
> > > >>> it can support richer window operations in flink SQL
> and
> > bring
> > > >>> great convenience to users.
> > > >>> (we are now only supported group window in flink).
> > > >>>
> > > >>> Regarding the SESSION window, I think it's especially
> > useful
> > > for
> > > >>> user behavior analysis(e.g.
> > > >>> counting user visits on a news website or social
> > platform), but
> > > >>> I agree that we can keep it
> > > >>> out of the FLIP now to catch up 1.12.
> > > >>>
> > > >>> Recently, I've done some work on the stream planner
> with
> > the
> > > >>> TVFs, and I'm willing to contribute
> > > >>> to this part. Is it in the plan of this FLIP?
> > > >>>
> > > >>> Best,
> > > >>> PengchengLiu
> > > >>>
> > > >>>
> > > >>> 在 2020/9/26 下午11:09,“Jark Wu”<[email protected]> 写入:
> > > >>>
> > > >>> Hi pengcheng,
> > > >>>
> > > >>> That's great to see you also have the need of window join.
> > > >>> You are right, the windowing TVF is a powerful feature
> which
> > can
> > > >>> support
> > > >>> more operations in the future.
> > > >>> I think it as of the date time "partition" selection in
> > batch SQL
> > > >>> jobs,
> > > >>> with this new syntax, I think it is possible
> > > >>> to migrate traditional batch SQL jobs to Flink SQL by
> > changing a
> > > >>> few lines.
> > > >>>
> > > >>> Regarding the SESSION window, this is on purpose to keep it
> > out of
> > > >>> the
> > > >>> FLIP, because we want to keep the
> > > >>> FLIP small to catch up 1.12 and SESSION TVF is rarely
> useful
> > (e.g.
> > > >>> session
> > > >>> window join?).
> > > >>>
> > > >>> Best,
> > > >>> Jark
> > > >>>
> > > >>> On Fri, 25 Sep 2020 at 22:59, liupengcheng <
> > > >>> [email protected]>
> > > >>> wrote:
> > > >>>
> > > >>> > Hi, Jark,
> > > >>> > I'm very interested in this feature, and I'm also
> > working
> > > >>> on this
> > > >>> > recently.
> > > >>> > I just have a glance at the FLIP, it's good, but
> I
> > found
> > > >>> that
> > > >>> > there is no plan to add SESSION windows.
> > > >>> > Also, I think there can be more things we can do
> > based on
> > > >>> this new
> > > >>> > syntax. For example,
> > > >>> > - window sort support
> > > >>> > - window union/intersect/minus support
> > > >>> > - Improve dimension table join
> > > >>> > We can have more deep discussion on this new
> > feature
> > > later
> > > >>> .
> > > >>> > I've also opened an jira that is related to this
> > feature
> > > >>> recently:
> > > >>> > https://issues.apache.org/jira/browse/FLINK-18830
> > > >>> >
> > > >>> > Best!
> > > >>> > PengchengLiu
> > > >>> >
> > > >>> > 在 2020/9/25 下午10:30,“Jark Wu”<[email protected]> 写入:
> > > >>> >
> > > >>> > Hi everyone,
> > > >>> >
> > > >>> > I want to start a FLIP about supporting windowing
> > > table-valued
> > > >>> > functions
> > > >>> > (TVF).
> > > >>> > The main purpose of this FLIP is to improve the near
> > > real-time
> > > >>> (NRT)
> > > >>> > experience of Flink.
> > > >>> >
> > > >>> > FLIP-145:
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function
> > > >>> >
> > > >>> > We want to introduce TUMBLE, HOP, CUMULATE windowing
> > TVFs,
> > > the
> > > >>> > CUMULATE is
> > > >>> > a new kind of window.
> > > >>> > With the windowing TVFs, we can support richer
> > operations on
> > > >>> windows,
> > > >>> > including window join, window TopN and so on.
> > > >>> > This makes things simple: we only need to assign
> > windows at
> > > the
> > > >>> > beginning
> > > >>> > of the query, and then apply operations after that
> like
> > > >>> traditional
> > > >>> > batch
> > > >>> > SQL.
> > > >>> > We hope it can help to reduce the learning curve of
> > windows,
> > > >>> improve
> > > >>> > NRT
> > > >>> > for Flink, and attract more batch users.
> > > >>> >
> > > >>> > A simple code snippet for 10 minutes tumbling window
> > > aggregate:
> > > >>> >
> > > >>> > SELECT window_start, window_end, SUM(price)
> > > >>> > FROM TABLE(
> > > >>> > TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL
> > '10'
> > > >>> MINUTES))
> > > >>> > GROUP BY window_start, window_end;
> > > >>> >
> > > >>> > I'm looking forward to your feedback.
> > > >>> >
> > > >>> > Best,
> > > >>> > Jark
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>>
> > > >>>
> > > >>>
> > >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >
>
--
Best,
Benchao Li