Hi Danny, Thanks for the hint about named params syntax, I added examples with named params in the FLIP.
Best, Jark On Sat, 10 Oct 2020 at 15:03, Pengcheng Liu <pengchengliucr...@gmail.com> wrote: > Hi, Jark, > > I've got some different opinions there, I think it's a very common use > case to use > window operators in combination with streaming operators(even those > time operators). > (e.g. for some tables, users only care data within a period, but for > other tables, they may > want the whole historical data). > The pipeline may looks like this: > window join -> dimension table join -> stream aggregate -> stream sort > > Just as what you said, the key clause can be used to distinguish > whether a operator should > be translated to a window operator or a streaming operator. > > Also, as I've mentioned before, 1) for time operator after window > aggregation, the auxiliary function > which is used to access time attribute column can be actually replaced > with (window_end -1). > Actually, we only just need to make the results of the upstream > contains a time column whose > range is within (window_start, window_end), and thus the downstream > time operators can work on it > (driving by the original watermark in the source). 2) for time operator > after other window operators, > the downstream time operators can access the time column directly from > it's input. > > One more thoughts there, maybe the window TVFs can re-assign timestamps > and watermarks, so > that in some case when the watermark can not be retrieved from source > directly(may needs some > conversions), the watermark can still be assigned dynamically in the > SQL(use the time column as > the watermark column) and thus make it work. I think this can save much > time to revise the event > time column in some cases(this is a real demand in our production > environment). > > I strongly suggest that we should support the combination usage of > window operators and > streaming operators. And I think we can achieve this with little work. > > Best, > Pengcheng > > > Jark Wu <imj...@gmail.com> 于2020年10月10日周六 下午1:45写道: > >> Hi Benchao, >> >> That's a good question. >> >> IMO, the new windowed operators and the current time operators are two >> different sets of functions, >> just like time operators and non-time operators are two different sets of >> functions. >> I think it's fine if we don't support integrating them, just like time >> operators can't be applied on non-windowed aggregate. >> If users want to use time operators in the whole pipeline, then he/she can >> use the grouped window aggregates instead of the window TVFs. >> >> The key idea of window TVF is that all the operators in the pipeline are >> based on the **windows**. >> In terms of syntax, if the key clause (e.g. group by, partitioned by, join >> on, order by) contains window_start and window_end, >> it can be translated into windowed operators. >> Thus, we will have windowed CEP, windowed sort, windowed over aggregate in >> the future to make it possible to build a windowed pipeline. >> >> But I think we can elaborate the integration more in the future if users >> need it. Actually, I don't fully understand the scenario of integrating >> window TVF and time operators at this point. >> For example, interval join an input stream and a window join result. I >> don't see why it can't be expressed by nested window join and why users >> have to use interval join here. >> Maybe we can wait for more inputs from users when the window TVF is >> released and we can elaborate it again. >> >> Best, >> Jark >> >> On Sat, 10 Oct 2020 at 12:01, 刘 芃成 <pengchengliucr...@gmail.com> wrote: >> >> > Hi, Benchao, >> > I think I got your point, actually, in current implementation for >> > group window aggregation, the value of time attributes(e.g. >> > TUMBLE_ROWTIME/TUMBLE_PROCTIME) is calculated as (window_end – 1), so I >> > think we can just use it directly if you need this. But I think this >> time >> > attributes is mainly suggested to use in case of cascaded window >> operations. >> > Regarding the example you provided, I think the semantics of the SQL in >> > your example which doing interval join(e.g. with TUMBLE_ROWTIME) after >> > window aggregation is not clear in the current implementation, and I >> think >> > that’s a strong reason why we need the new TVFs syntax. >> > With the new syntax, users should understand which time column to >> > use and how to generate it when doing interval join and etc. >> > >> > Best, >> > Pengcheng >> > >> > 发件人: Benchao Li <libenc...@apache.org> >> > 日期: 2020年10月10日 星期六 上午11:02 >> > 收件人: pengcheng Liu <pengchengliucr...@gmail.com> >> > 抄送: dev <dev@flink.apache.org> >> > 主题: Re: [DISCUSS] FLIP-145: Support SQL windowing table-valued function >> > >> > Hi pengcheng, >> > >> > Thanks for your response. >> > I knew that the original time attribute column will be retained after >> the >> > TVF, >> > what I'm questioning is how do we get the time attribute column after >> > Aggregation. >> > Your answer did not remove my doubts about this. >> > >> > It's ok if we did not plan to integrate new TVF aggregate with old "time >> > attribute scenarios" >> > listed in my previous email in this FLIP. However it's good to elaborate >> > more on this, and >> > leave it to the future plan. >> > >> > pengcheng Liu <pengchengliucr...@gmail.com<mailto: >> > pengchengliucr...@gmail.com>> 于2020年10月10日周六 上午10:45写道: >> > Hi,Benchao, >> > In TVFs, the time attributes is just passed through from parent >> rels, >> > and the TVFs just add two >> > additional window attributes(i.e. window_start & window_end). Also, >> I >> > think the time columns can be not only a time attribute >> > with type of `TimeIndicatorType` but also a regular column with type >> > of `Timestamp`. >> > >> > For cascaded window operations, we can use window_start/window_end >> of >> > the previous window result directly to >> > indicate operating on the same window, or use new DESCRIPTOR column >> > to assign new windows, in case of the change of >> > the time column(e.g. in some case, the original timestamp is >> > inaccurate and need some conversion to be used). >> > >> > You can check the definition or signature of these TVFs in the FLIP. >> > e.g. >> > SELECT * FROM TABLE( >> > TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES)) >> > In the example, the `bidtime` is the time attribute column, which is >> > the first operand of the DESCRIPTOR function. >> > >> > +1 start voting. >> > >> > Benchao Li <libenc...@apache.org<mailto:libenc...@apache.org>> >> > 于2020年10月10日周六 上午10:08写道: >> > Hi Jark, >> > >> > 2 & 3 sounds good to me. >> > >> > Regarding time attribute, >> > I still have some questions, I knew it's easy to support cascaded window >> > aggregate using new TVFs. >> > However there are some other places where need time attribute: >> > - CEP >> > - interval join >> > - order by >> > - over window >> > If there is no time attribute column, how do we integrate these old >> > features with the new TVFs. >> > E.g. >> > StreamA -> new window aggregate -> interval join -> Sink >> > / >> > StreamB ----------------------------------- >> > >> > >> > Jark Wu <imj...@gmail.com<mailto:imj...@gmail.com>> 于2020年10月9日周五 >> > 下午11:51写道: >> > Hi Benchao, >> > >> > 1) time attribute >> > Yes. We don't need time attribute auxiliary function. Because the new >> > window operations are all based on the >> > window_start and window_end columns instead of on the time attributes. >> So >> > we don't need to propagate time attributes. >> > Cascaded window aggregate can be expressed by simply GROUP BY the >> > window_start and window_end of the previous window result. >> > I have added a cascaded window aggregate example in the Tumbling Window >> > section in the FLIP. >> > If you want to define proctime window aggregate, the time column in TVF >> > should be a proctime attribute field (or PROCTIME() function). >> > >> > 2) batch support >> > Yes. The proposed syntax/API are unified for batch and streaming. Batch >> > support is in the plan, but may not have enough time to catch up 1.12. >> > >> > 3) support `grouping sets` >> > This is not included in the FLIP, but I think it's great if we can >> support >> > `grouping sets`. >> > The existing window impl doesn't support this because we convert the >> > LogicalAggregate into WindowAggregate in the beginning, >> > the expand grouping sets rule can't be applied in this situation. >> > Fortunately, with the new window impl, the conversion to WindowAggregate >> > will happen at the end, so I think the expand rule can be >> > applied and support this feature naturally. >> > Therefore, IMO, we don't need to include this feature in this FLIP to >> avoid >> > the FLIP being too large. >> > This can be a follow-up issue (maybe just add tests and docs) after the >> > FLIP. >> > >> > Best, >> > Jark >> > >> > >> > On Fri, 9 Oct 2020 at 19:09, 刘 芃成 <pengchengliucr...@gmail.com<mailto: >> > pengchengliucr...@gmail.com>> wrote: >> > >> > > Hi,Benchao, >> > > Welcome to join the discussion, yes, this new syntax can make >> SQL >> > > more clear and simpler. >> > > For your first question, the `window_start` and `window_end` >> > > columns will be added automatically, >> > > so we don't need to use auxiliary group functions to infer or >> > > access the window properties. >> > > >> > > For the `grouping sets` on TVFs, I think it's interesting if >> we >> > > can support it, as we already supported `grouping sets` >> > > on streaming aggregates in blink planner. But I'm not sure if >> it >> > > will be included into this FLIP. >> > > >> > > cc @Jark Wu >> > > >> > > Best, >> > > Pengcheng >> > > >> > > >> > > 在 2020/10/9 下午5:25,“Benchao Li”<libenc...@apache.org<mailto: >> > libenc...@apache.org>> 写入: >> > > >> > > Thanks Jark for bringing this discussion, I like this FLIP very >> much. >> > > >> > > Especially the cumulate window, it's much like the current TUMBLE >> > > window + >> > > Fast Emit (which is an undocumented experimental feature), >> however, >> > > it's >> > > more powerful. >> > > >> > > And This will make the SQL semantic more standard, especially for >> the >> > > HOPPING window. >> > > >> > > Regarding time attribute, >> > > It seems that we don't need a specific function to infer the time >> > > attribute >> > > like >> > > `TUMBLE_ROWTIME` / `TUMBLE_PROCTIME`. Then are `window_start` and >> > > `window_end` >> > > column a time attribute column automatically? >> > > - If not, what will be the time attribute of the result relation >> of >> > > these >> > > TVFs? >> > > Especially after the window aggregation. >> > > - If yes, then how do we handle proctime? >> > > >> > > Regarding batch operators, >> > > It's great to hear that we can reuse the batch operators in >> > continuous >> > > batch mode >> > > as you mentioned in the FLIP. >> > > Current window aggregate could also be used in batch mode with >> > > rowtime. Do >> > > you plan >> > > to support these TVFs for batch mode in this FLIP? Hence the >> > Table/SQL >> > > is a >> > > unified >> > > API, it's great if we can keep the features complete both in >> > streaming >> > > and >> > > batch mode. >> > > >> > > There is one more question, I don't know whether it should be >> > > considered in >> > > this FLIP. >> > > Does the new window support `grouping sets`? (It's not supported >> in >> > old >> > > window impl). >> > > >> > > Jark Wu <imj...@gmail.com<mailto:imj...@gmail.com>> 于2020年10月9日周五 >> > 下午4:14写道: >> > > >> > > > Hi all, >> > > > >> > > > I know we have a lot of discussion and development on going >> right >> > > now but >> > > > it would be great if we can get FLIP-145 into a votable state. >> > > > If there are no objections, I would like to start voting in the >> > next >> > > days. >> > > > >> > > > Best, >> > > > Jark >> > > > >> > > > On Thu, 1 Oct 2020 at 14:29, Jark Wu <imj...@gmail.com<mailto: >> > imj...@gmail.com>> wrote: >> > > > >> > > > > Hi everyone, >> > > > > >> > > > > I have added a section for Performance Optimization to >> describe >> > > how to >> > > > > improve the performance in the short-term and long-term >> > > > > and sketch the future performance potential under the new >> window >> > > API. >> > > > > Introducing the window API is just the first step, we will >> > > > > continuously improve the performance to make it powerful and >> > > useful. >> > > > > >> > > > > Best, >> > > > > Jark >> > > > > >> > > > > On Thu, 1 Oct 2020 at 14:28, Jark Wu <imj...@gmail.com >> <mailto: >> > imj...@gmail.com>> wrote: >> > > > > >> > > > >> Hi Pengcheng, >> > > > >> >> > > > >> Yes, the window TVF is part of the FLIP. Welcome to >> contribute >> > > and join >> > > > >> the discussion. >> > > > >> Regarding the SESSION window aggregation, users can use the >> > > existing >> > > > >> grouped session window function. >> > > > >> >> > > > >> Best, >> > > > >> Jark >> > > > >> >> > > > >> On Sun, 27 Sep 2020 at 21:24, liupengcheng < >> > > pengchengliucr...@gmail.com<mailto:pengchengliucr...@gmail.com> >> > > > > >> > > > >> wrote: >> > > > >> >> > > > >>> Hi Jark, >> > > > >>> Thanks for reply, yes, I think it's a good feature, >> it >> > > can >> > > > >>> improve the NRT scenarios >> > > > >>> as you mentioned in the FLIP. Also, I think it can >> > > improve the >> > > > >>> streaming SQL greatly, >> > > > >>> it can support richer window operations in flink SQL >> > and >> > > bring >> > > > >>> great convenience to users. >> > > > >>> (we are now only supported group window in flink). >> > > > >>> >> > > > >>> Regarding the SESSION window, I think it's >> especially >> > > useful >> > > > for >> > > > >>> user behavior analysis(e.g. >> > > > >>> counting user visits on a news website or social >> > > platform), but >> > > > >>> I agree that we can keep it >> > > > >>> out of the FLIP now to catch up 1.12. >> > > > >>> >> > > > >>> Recently, I've done some work on the stream planner >> > with >> > > the >> > > > >>> TVFs, and I'm willing to contribute >> > > > >>> to this part. Is it in the plan of this FLIP? >> > > > >>> >> > > > >>> Best, >> > > > >>> PengchengLiu >> > > > >>> >> > > > >>> >> > > > >>> 在 2020/9/26 下午11:09,“Jark Wu”<imj...@gmail.com<mailto: >> > imj...@gmail.com>> 写入: >> > > > >>> >> > > > >>> Hi pengcheng, >> > > > >>> >> > > > >>> That's great to see you also have the need of window >> join. >> > > > >>> You are right, the windowing TVF is a powerful feature >> > which >> > > can >> > > > >>> support >> > > > >>> more operations in the future. >> > > > >>> I think it as of the date time "partition" selection in >> > > batch SQL >> > > > >>> jobs, >> > > > >>> with this new syntax, I think it is possible >> > > > >>> to migrate traditional batch SQL jobs to Flink SQL by >> > > changing a >> > > > >>> few lines. >> > > > >>> >> > > > >>> Regarding the SESSION window, this is on purpose to >> keep it >> > > out of >> > > > >>> the >> > > > >>> FLIP, because we want to keep the >> > > > >>> FLIP small to catch up 1.12 and SESSION TVF is rarely >> > useful >> > > (e.g. >> > > > >>> session >> > > > >>> window join?). >> > > > >>> >> > > > >>> Best, >> > > > >>> Jark >> > > > >>> >> > > > >>> On Fri, 25 Sep 2020 at 22:59, liupengcheng < >> > > > >>> pengchengliucr...@gmail.com<mailto: >> pengchengliucr...@gmail.com >> > >> >> > > > >>> wrote: >> > > > >>> >> > > > >>> > Hi, Jark, >> > > > >>> > I'm very interested in this feature, and I'm >> also >> > > working >> > > > >>> on this >> > > > >>> > recently. >> > > > >>> > I just have a glance at the FLIP, it's good, >> but >> > I >> > > found >> > > > >>> that >> > > > >>> > there is no plan to add SESSION windows. >> > > > >>> > Also, I think there can be more things we can >> do >> > > based on >> > > > >>> this new >> > > > >>> > syntax. For example, >> > > > >>> > - window sort support >> > > > >>> > - window union/intersect/minus support >> > > > >>> > - Improve dimension table join >> > > > >>> > We can have more deep discussion on this new >> > > feature >> > > > later >> > > > >>> . >> > > > >>> > I've also opened an jira that is related to >> this >> > > feature >> > > > >>> recently: >> > > > >>> > https://issues.apache.org/jira/browse/FLINK-18830 >> > > > >>> > >> > > > >>> > Best! >> > > > >>> > PengchengLiu >> > > > >>> > >> > > > >>> > 在 2020/9/25 下午10:30,“Jark Wu”<imj...@gmail.com >> <mailto: >> > imj...@gmail.com>> 写入: >> > > > >>> > >> > > > >>> > Hi everyone, >> > > > >>> > >> > > > >>> > I want to start a FLIP about supporting windowing >> > > > table-valued >> > > > >>> > functions >> > > > >>> > (TVF). >> > > > >>> > The main purpose of this FLIP is to improve the >> near >> > > > real-time >> > > > >>> (NRT) >> > > > >>> > experience of Flink. >> > > > >>> > >> > > > >>> > FLIP-145: >> > > > >>> > >> > > > >>> > >> > > > >>> >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function >> > > > >>> > >> > > > >>> > We want to introduce TUMBLE, HOP, CUMULATE >> windowing >> > > TVFs, >> > > > the >> > > > >>> > CUMULATE is >> > > > >>> > a new kind of window. >> > > > >>> > With the windowing TVFs, we can support richer >> > > operations on >> > > > >>> windows, >> > > > >>> > including window join, window TopN and so on. >> > > > >>> > This makes things simple: we only need to assign >> > > windows at >> > > > the >> > > > >>> > beginning >> > > > >>> > of the query, and then apply operations after that >> > like >> > > > >>> traditional >> > > > >>> > batch >> > > > >>> > SQL. >> > > > >>> > We hope it can help to reduce the learning curve >> of >> > > windows, >> > > > >>> improve >> > > > >>> > NRT >> > > > >>> > for Flink, and attract more batch users. >> > > > >>> > >> > > > >>> > A simple code snippet for 10 minutes tumbling >> window >> > > > aggregate: >> > > > >>> > >> > > > >>> > SELECT window_start, window_end, SUM(price) >> > > > >>> > FROM TABLE( >> > > > >>> > TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), >> INTERVAL >> > > '10' >> > > > >>> MINUTES)) >> > > > >>> > GROUP BY window_start, window_end; >> > > > >>> > >> > > > >>> > I'm looking forward to your feedback. >> > > > >>> > >> > > > >>> > Best, >> > > > >>> > Jark >> > > > >>> > >> > > > >>> > >> > > > >>> > >> > > > >>> >> > > > >>> >> > > > >>> >> > > > >> > > >> > > >> > > -- >> > > >> > > Best, >> > > Benchao Li >> > > >> > >> > >> > -- >> > >> > Best, >> > Benchao Li >> > >> > >> > -- >> > >> > Best, >> > Benchao Li >> > >> >