Hi Jark, Thanks for your reply, this makes sense to me.
The scenario I used above is just a case to explain what I'm concerned about, not necessarily a production use case. We can leave it to the future to see whether other users have these use cases. Then I have no other concerns, +1 to start the VOTE. Jark Wu <imj...@gmail.com> 于2020年10月10日周六 下午1:44写道: > Hi Benchao, > > That's a good question. > > IMO, the new windowed operators and the current time operators are two > different sets of functions, > just like time operators and non-time operators are two different sets of > functions. > I think it's fine if we don't support integrating them, just like time > operators can't be applied on non-windowed aggregate. > If users want to use time operators in the whole pipeline, then he/she can > use the grouped window aggregates instead of the window TVFs. > > The key idea of window TVF is that all the operators in the pipeline are > based on the **windows**. > In terms of syntax, if the key clause (e.g. group by, partitioned by, join > on, order by) contains window_start and window_end, > it can be translated into windowed operators. > Thus, we will have windowed CEP, windowed sort, windowed over aggregate in > the future to make it possible to build a windowed pipeline. > > But I think we can elaborate the integration more in the future if users > need it. Actually, I don't fully understand the scenario of integrating > window TVF and time operators at this point. > For example, interval join an input stream and a window join result. I > don't see why it can't be expressed by nested window join and why users > have to use interval join here. > Maybe we can wait for more inputs from users when the window TVF is > released and we can elaborate it again. > > Best, > Jark > > On Sat, 10 Oct 2020 at 12:01, 刘 芃成 <pengchengliucr...@gmail.com> wrote: > >> Hi, Benchao, >> I think I got your point, actually, in current implementation for >> group window aggregation, the value of time attributes(e.g. >> TUMBLE_ROWTIME/TUMBLE_PROCTIME) is calculated as (window_end – 1), so I >> think we can just use it directly if you need this. But I think this time >> attributes is mainly suggested to use in case of cascaded window operations. >> Regarding the example you provided, I think the semantics of the SQL in >> your example which doing interval join(e.g. with TUMBLE_ROWTIME) after >> window aggregation is not clear in the current implementation, and I think >> that’s a strong reason why we need the new TVFs syntax. >> With the new syntax, users should understand which time column to >> use and how to generate it when doing interval join and etc. >> >> Best, >> Pengcheng >> >> 发件人: Benchao Li <libenc...@apache.org> >> 日期: 2020年10月10日 星期六 上午11:02 >> 收件人: pengcheng Liu <pengchengliucr...@gmail.com> >> 抄送: dev <dev@flink.apache.org> >> 主题: Re: [DISCUSS] FLIP-145: Support SQL windowing table-valued function >> >> Hi pengcheng, >> >> Thanks for your response. >> I knew that the original time attribute column will be retained after the >> TVF, >> what I'm questioning is how do we get the time attribute column after >> Aggregation. >> Your answer did not remove my doubts about this. >> >> It's ok if we did not plan to integrate new TVF aggregate with old "time >> attribute scenarios" >> listed in my previous email in this FLIP. However it's good to elaborate >> more on this, and >> leave it to the future plan. >> >> pengcheng Liu <pengchengliucr...@gmail.com<mailto: >> pengchengliucr...@gmail.com>> 于2020年10月10日周六 上午10:45写道: >> Hi,Benchao, >> In TVFs, the time attributes is just passed through from parent rels, >> and the TVFs just add two >> additional window attributes(i.e. window_start & window_end). Also, I >> think the time columns can be not only a time attribute >> with type of `TimeIndicatorType` but also a regular column with type >> of `Timestamp`. >> >> For cascaded window operations, we can use window_start/window_end of >> the previous window result directly to >> indicate operating on the same window, or use new DESCRIPTOR column >> to assign new windows, in case of the change of >> the time column(e.g. in some case, the original timestamp is >> inaccurate and need some conversion to be used). >> >> You can check the definition or signature of these TVFs in the FLIP. >> e.g. >> SELECT * FROM TABLE( >> TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES)) >> In the example, the `bidtime` is the time attribute column, which is >> the first operand of the DESCRIPTOR function. >> >> +1 start voting. >> >> Benchao Li <libenc...@apache.org<mailto:libenc...@apache.org>> >> 于2020年10月10日周六 上午10:08写道: >> Hi Jark, >> >> 2 & 3 sounds good to me. >> >> Regarding time attribute, >> I still have some questions, I knew it's easy to support cascaded window >> aggregate using new TVFs. >> However there are some other places where need time attribute: >> - CEP >> - interval join >> - order by >> - over window >> If there is no time attribute column, how do we integrate these old >> features with the new TVFs. >> E.g. >> StreamA -> new window aggregate -> interval join -> Sink >> / >> StreamB ----------------------------------- >> >> >> Jark Wu <imj...@gmail.com<mailto:imj...@gmail.com>> 于2020年10月9日周五 >> 下午11:51写道: >> Hi Benchao, >> >> 1) time attribute >> Yes. We don't need time attribute auxiliary function. Because the new >> window operations are all based on the >> window_start and window_end columns instead of on the time attributes. So >> we don't need to propagate time attributes. >> Cascaded window aggregate can be expressed by simply GROUP BY the >> window_start and window_end of the previous window result. >> I have added a cascaded window aggregate example in the Tumbling Window >> section in the FLIP. >> If you want to define proctime window aggregate, the time column in TVF >> should be a proctime attribute field (or PROCTIME() function). >> >> 2) batch support >> Yes. The proposed syntax/API are unified for batch and streaming. Batch >> support is in the plan, but may not have enough time to catch up 1.12. >> >> 3) support `grouping sets` >> This is not included in the FLIP, but I think it's great if we can support >> `grouping sets`. >> The existing window impl doesn't support this because we convert the >> LogicalAggregate into WindowAggregate in the beginning, >> the expand grouping sets rule can't be applied in this situation. >> Fortunately, with the new window impl, the conversion to WindowAggregate >> will happen at the end, so I think the expand rule can be >> applied and support this feature naturally. >> Therefore, IMO, we don't need to include this feature in this FLIP to >> avoid >> the FLIP being too large. >> This can be a follow-up issue (maybe just add tests and docs) after the >> FLIP. >> >> Best, >> Jark >> >> >> On Fri, 9 Oct 2020 at 19:09, 刘 芃成 <pengchengliucr...@gmail.com<mailto: >> pengchengliucr...@gmail.com>> wrote: >> >> > Hi,Benchao, >> > Welcome to join the discussion, yes, this new syntax can make >> SQL >> > more clear and simpler. >> > For your first question, the `window_start` and `window_end` >> > columns will be added automatically, >> > so we don't need to use auxiliary group functions to infer or >> > access the window properties. >> > >> > For the `grouping sets` on TVFs, I think it's interesting if we >> > can support it, as we already supported `grouping sets` >> > on streaming aggregates in blink planner. But I'm not sure if it >> > will be included into this FLIP. >> > >> > cc @Jark Wu >> > >> > Best, >> > Pengcheng >> > >> > >> > 在 2020/10/9 下午5:25,“Benchao Li”<libenc...@apache.org<mailto: >> libenc...@apache.org>> 写入: >> > >> > Thanks Jark for bringing this discussion, I like this FLIP very >> much. >> > >> > Especially the cumulate window, it's much like the current TUMBLE >> > window + >> > Fast Emit (which is an undocumented experimental feature), however, >> > it's >> > more powerful. >> > >> > And This will make the SQL semantic more standard, especially for >> the >> > HOPPING window. >> > >> > Regarding time attribute, >> > It seems that we don't need a specific function to infer the time >> > attribute >> > like >> > `TUMBLE_ROWTIME` / `TUMBLE_PROCTIME`. Then are `window_start` and >> > `window_end` >> > column a time attribute column automatically? >> > - If not, what will be the time attribute of the result relation of >> > these >> > TVFs? >> > Especially after the window aggregation. >> > - If yes, then how do we handle proctime? >> > >> > Regarding batch operators, >> > It's great to hear that we can reuse the batch operators in >> continuous >> > batch mode >> > as you mentioned in the FLIP. >> > Current window aggregate could also be used in batch mode with >> > rowtime. Do >> > you plan >> > to support these TVFs for batch mode in this FLIP? Hence the >> Table/SQL >> > is a >> > unified >> > API, it's great if we can keep the features complete both in >> streaming >> > and >> > batch mode. >> > >> > There is one more question, I don't know whether it should be >> > considered in >> > this FLIP. >> > Does the new window support `grouping sets`? (It's not supported in >> old >> > window impl). >> > >> > Jark Wu <imj...@gmail.com<mailto:imj...@gmail.com>> 于2020年10月9日周五 >> 下午4:14写道: >> > >> > > Hi all, >> > > >> > > I know we have a lot of discussion and development on going right >> > now but >> > > it would be great if we can get FLIP-145 into a votable state. >> > > If there are no objections, I would like to start voting in the >> next >> > days. >> > > >> > > Best, >> > > Jark >> > > >> > > On Thu, 1 Oct 2020 at 14:29, Jark Wu <imj...@gmail.com<mailto: >> imj...@gmail.com>> wrote: >> > > >> > > > Hi everyone, >> > > > >> > > > I have added a section for Performance Optimization to describe >> > how to >> > > > improve the performance in the short-term and long-term >> > > > and sketch the future performance potential under the new window >> > API. >> > > > Introducing the window API is just the first step, we will >> > > > continuously improve the performance to make it powerful and >> > useful. >> > > > >> > > > Best, >> > > > Jark >> > > > >> > > > On Thu, 1 Oct 2020 at 14:28, Jark Wu <imj...@gmail.com<mailto: >> imj...@gmail.com>> wrote: >> > > > >> > > >> Hi Pengcheng, >> > > >> >> > > >> Yes, the window TVF is part of the FLIP. Welcome to contribute >> > and join >> > > >> the discussion. >> > > >> Regarding the SESSION window aggregation, users can use the >> > existing >> > > >> grouped session window function. >> > > >> >> > > >> Best, >> > > >> Jark >> > > >> >> > > >> On Sun, 27 Sep 2020 at 21:24, liupengcheng < >> > pengchengliucr...@gmail.com<mailto:pengchengliucr...@gmail.com> >> > > > >> > > >> wrote: >> > > >> >> > > >>> Hi Jark, >> > > >>> Thanks for reply, yes, I think it's a good feature, it >> > can >> > > >>> improve the NRT scenarios >> > > >>> as you mentioned in the FLIP. Also, I think it can >> > improve the >> > > >>> streaming SQL greatly, >> > > >>> it can support richer window operations in flink SQL >> and >> > bring >> > > >>> great convenience to users. >> > > >>> (we are now only supported group window in flink). >> > > >>> >> > > >>> Regarding the SESSION window, I think it's especially >> > useful >> > > for >> > > >>> user behavior analysis(e.g. >> > > >>> counting user visits on a news website or social >> > platform), but >> > > >>> I agree that we can keep it >> > > >>> out of the FLIP now to catch up 1.12. >> > > >>> >> > > >>> Recently, I've done some work on the stream planner >> with >> > the >> > > >>> TVFs, and I'm willing to contribute >> > > >>> to this part. Is it in the plan of this FLIP? >> > > >>> >> > > >>> Best, >> > > >>> PengchengLiu >> > > >>> >> > > >>> >> > > >>> 在 2020/9/26 下午11:09,“Jark Wu”<imj...@gmail.com<mailto: >> imj...@gmail.com>> 写入: >> > > >>> >> > > >>> Hi pengcheng, >> > > >>> >> > > >>> That's great to see you also have the need of window join. >> > > >>> You are right, the windowing TVF is a powerful feature >> which >> > can >> > > >>> support >> > > >>> more operations in the future. >> > > >>> I think it as of the date time "partition" selection in >> > batch SQL >> > > >>> jobs, >> > > >>> with this new syntax, I think it is possible >> > > >>> to migrate traditional batch SQL jobs to Flink SQL by >> > changing a >> > > >>> few lines. >> > > >>> >> > > >>> Regarding the SESSION window, this is on purpose to keep >> it >> > out of >> > > >>> the >> > > >>> FLIP, because we want to keep the >> > > >>> FLIP small to catch up 1.12 and SESSION TVF is rarely >> useful >> > (e.g. >> > > >>> session >> > > >>> window join?). >> > > >>> >> > > >>> Best, >> > > >>> Jark >> > > >>> >> > > >>> On Fri, 25 Sep 2020 at 22:59, liupengcheng < >> > > >>> pengchengliucr...@gmail.com<mailto: >> pengchengliucr...@gmail.com>> >> > > >>> wrote: >> > > >>> >> > > >>> > Hi, Jark, >> > > >>> > I'm very interested in this feature, and I'm >> also >> > working >> > > >>> on this >> > > >>> > recently. >> > > >>> > I just have a glance at the FLIP, it's good, >> but I >> > found >> > > >>> that >> > > >>> > there is no plan to add SESSION windows. >> > > >>> > Also, I think there can be more things we can do >> > based on >> > > >>> this new >> > > >>> > syntax. For example, >> > > >>> > - window sort support >> > > >>> > - window union/intersect/minus support >> > > >>> > - Improve dimension table join >> > > >>> > We can have more deep discussion on this new >> > feature >> > > later >> > > >>> . >> > > >>> > I've also opened an jira that is related to this >> > feature >> > > >>> recently: >> > > >>> > https://issues.apache.org/jira/browse/FLINK-18830 >> > > >>> > >> > > >>> > Best! >> > > >>> > PengchengLiu >> > > >>> > >> > > >>> > 在 2020/9/25 下午10:30,“Jark Wu”<imj...@gmail.com<mailto: >> imj...@gmail.com>> 写入: >> > > >>> > >> > > >>> > Hi everyone, >> > > >>> > >> > > >>> > I want to start a FLIP about supporting windowing >> > > table-valued >> > > >>> > functions >> > > >>> > (TVF). >> > > >>> > The main purpose of this FLIP is to improve the near >> > > real-time >> > > >>> (NRT) >> > > >>> > experience of Flink. >> > > >>> > >> > > >>> > FLIP-145: >> > > >>> > >> > > >>> > >> > > >>> >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function >> > > >>> > >> > > >>> > We want to introduce TUMBLE, HOP, CUMULATE windowing >> > TVFs, >> > > the >> > > >>> > CUMULATE is >> > > >>> > a new kind of window. >> > > >>> > With the windowing TVFs, we can support richer >> > operations on >> > > >>> windows, >> > > >>> > including window join, window TopN and so on. >> > > >>> > This makes things simple: we only need to assign >> > windows at >> > > the >> > > >>> > beginning >> > > >>> > of the query, and then apply operations after that >> like >> > > >>> traditional >> > > >>> > batch >> > > >>> > SQL. >> > > >>> > We hope it can help to reduce the learning curve of >> > windows, >> > > >>> improve >> > > >>> > NRT >> > > >>> > for Flink, and attract more batch users. >> > > >>> > >> > > >>> > A simple code snippet for 10 minutes tumbling window >> > > aggregate: >> > > >>> > >> > > >>> > SELECT window_start, window_end, SUM(price) >> > > >>> > FROM TABLE( >> > > >>> > TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL >> > '10' >> > > >>> MINUTES)) >> > > >>> > GROUP BY window_start, window_end; >> > > >>> > >> > > >>> > I'm looking forward to your feedback. >> > > >>> > >> > > >>> > Best, >> > > >>> > Jark >> > > >>> > >> > > >>> > >> > > >>> > >> > > >>> >> > > >>> >> > > >>> >> > > >> > >> > >> > -- >> > >> > Best, >> > Benchao Li >> > >> >> >> -- >> >> Best, >> Benchao Li >> >> >> -- >> >> Best, >> Benchao Li >> > -- Best, Benchao Li