Hi Haohui, our plan was in fact to piggy-back on Calcite and use the TUMBLE function [1] once is it is available (CALCITE-1345 [2]). Unfortunately, this issue does not seem to be very active, so I don't know what the progress is.
I would suggest to move the discussion about group windows to a separate thread and keep this one focused on the organization of the SQL OVER windows. Best, Fabian [1] http://calcite.apache.org/docs/stream.html) [2] https://issues.apache.org/jira/browse/CALCITE-1345 2017-01-23 22:42 GMT+01:00 Haohui Mai <ricet...@gmail.com>: > Hi Fabian, > > FLINK-4692 has added the support for tumbling window and we are excited to > try it out and expose it as a SQL construct. > > Just curious -- what's your thought on the SQL syntax on tumbling window? > > Implementation wise it might make sense to think tumbling window as a > special case of the sliding window. > > The problem I see is that the OVER construct might be insufficient to > support all the use cases of tumbling windows. For example, it fails to > express tumbling windows that have fractional time units (as pointed out in > http://calcite.apache.org/docs/stream.html). > > It looks to me that the Calcite / Azure Stream Analytics have introduced a > new construct (TUMBLE / TUMBLINGWINDOW) to address this issue. > > Do you think it is a good idea to follow the same conventions? Your ideas > are appreciated. > > Regards, > Haohui > > > On Mon, Jan 23, 2017 at 1:02 PM Haohui Mai <ricet...@gmail.com> wrote: > > > +1 > > > > We are also quite interested in these features and would love to > > participate and contribute. > > > > ~Haohui > > > > On Mon, Jan 23, 2017 at 7:31 AM Fabian Hueske <fhue...@gmail.com> wrote: > > > >> Hi everybody, > >> > >> it seems that currently several contributors are working on new features > >> for the streaming Table API / SQL around row windows (as defined in > >> FLIP-11 > >> [1]) and SQL OVER-style window (FLINK-4678, FLINK-4679, FLINK-4680, > >> FLINK-5584). > >> Since these efforts overlap quite a bit I spent some time thinking about > >> how we can approach these features and how to avoid overlapping > >> contributions. > >> > >> The challenge here is the following. Some of the Table API row windows > as > >> defined by FLIP-11 [1] are basically SQL OVER windows while other cannot > >> be > >> easily expressed as such (TumbleRows for row-count intervals, > >> SessionRows). > >> However, since Calcite already supports SQL OVER windows, we can reuse > the > >> optimization logic for some of the Table API row windows. I also thought > >> about the semantics of the TumbleRows and SessionRows windows as defined > >> in > >> FLIP-11 and came to the conclusion that these are not well defined in > >> FLIP-11 and should rather be defined as SlideRows windows with a special > >> PARTITION BY clause. > >> > >> I propose to approach SQL OVER windows and Table API row windows as > >> follows: > >> > >> We start with three simple cases for SQL OVER windows (not Table API > yet): > >> > >> * OVER RANGE for event time > >> * OVER RANGE for processing time > >> * OVER ROW for processing time > >> > >> All cases fulfill the following restrictions: > >> - All aggregations in SELECT must refer to the same window. > >> - PARTITION BY may not contain the rowtime attribute. > >> - ORDER BY must be on rowtime attribute (for event time) or on a marker > >> function that indicates processing time. Additional sort attributes are > >> not > >> supported initially. > >> - only "BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "BETWEEN x > >> PRECEDING AND CURRENT ROW" are supported. > >> > >> OVER ROW for event time cannot be easily supported. With event time, we > >> may > >> have late records which need to be injected into the order of records. > >> When > >> a record in injected in to the order where a row-count window has > already > >> been computed, this and all following windows will change. We could > either > >> drop the record or sent out many retraction records. I think it is best > to > >> not open this can of worms at this point. > >> > >> The rational for all of the above restrictions is to have first versions > >> of > >> OVER windows soon. > >> Once we have the above cases covered we can extend and remove > limitations > >> as follows: > >> > >> - Table API SlideRow windows (with the same restrictions as above). This > >> will be mostly API work since the execution part has been solved before. > >> - Add support for FOLLOWING (except UNBOUNDED FOLLOWING) > >> - Add support for different windows in SELECT. All windows must be > >> partitioned and ordered in the same way. > >> - Add support for additional ORDER BY attributes (besides time). > >> > >> As I said before, TumbleRows and SessionRows windows as in FLIP-11 are > not > >> well defined, IMO. > >> They can be expressed as SlideRows windows with special partitioning > >> (partitioning on fixed, non-overlapping time ranges for TumbleRows, and > >> gap-separated, non-overlapping time ranges for SessionRows) > >> I would not start to work on those yet. > >> > >> I would like to close all related JIRA issues (FLINK-4678, FLINK-4679, > >> FLINK-4680, FLINK-5584) and restructure the development of these > features > >> as outlined above with corresponding JIRA issues. > >> > >> What do others think? (I cc'ed the contributors assigned to the above > JIRA > >> issues) > >> > >> Best, Fabian > >> > >> [1] > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP- > 11%3A+Table+API+Stream+Aggregations > >> > > >