Hi,
We are also interested in streaming sql and very willing to participate and 
contribute.

We are now in progress and we will also contribute to calcite to push forward 
the window and stream-join support.



--------------
Sender: Fabian Hueske [mailto:fhue...@gmail.com] 
Send Time: 2017年1月24日 5:55
Receiver: dev@flink.apache.org
Theme: Re: [DISCUSS] Development of SQL OVER / Table API Row Windows for 
streaming tables

Hi Haohui,

our plan was in fact to piggy-back on Calcite and use the TUMBLE function [1] 
once is it is available (CALCITE-1345 [2]).
Unfortunately, this issue does not seem to be very active, so I don't know what 
the progress is.

I would suggest to move the discussion about group windows to a separate thread 
and keep this one focused on the organization of the SQL OVER windows.

Best,
Fabian

[1] http://calcite.apache.org/docs/stream.html)
[2] https://issues.apache.org/jira/browse/CALCITE-1345

2017-01-23 22:42 GMT+01:00 Haohui Mai <ricet...@gmail.com>:

> Hi Fabian,
>
> FLINK-4692 has added the support for tumbling window and we are 
> excited to try it out and expose it as a SQL construct.
>
> Just curious -- what's your thought on the SQL syntax on tumbling window?
>
> Implementation wise it might make sense to think tumbling window as a 
> special case of the sliding window.
>
> The problem I see is that the OVER construct might be insufficient to 
> support all the use cases of tumbling windows. For example, it fails 
> to express tumbling windows that have fractional time units (as 
> pointed out in http://calcite.apache.org/docs/stream.html).
>
> It looks to me that the Calcite / Azure Stream Analytics have 
> introduced a new construct (TUMBLE / TUMBLINGWINDOW) to address this issue.
>
> Do you think it is a good idea to follow the same conventions? Your 
> ideas are appreciated.
>
> Regards,
> Haohui
>
>
> On Mon, Jan 23, 2017 at 1:02 PM Haohui Mai <ricet...@gmail.com> wrote:
>
> > +1
> >
> > We are also quite interested in these features and would love to 
> > participate and contribute.
> >
> > ~Haohui
> >
> > On Mon, Jan 23, 2017 at 7:31 AM Fabian Hueske <fhue...@gmail.com> wrote:
> >
> >> Hi everybody,
> >>
> >> it seems that currently several contributors are working on new 
> >> features for the streaming Table API / SQL around row windows (as 
> >> defined in
> >> FLIP-11
> >> [1]) and SQL OVER-style window (FLINK-4678, FLINK-4679, FLINK-4680, 
> >> FLINK-5584).
> >> Since these efforts overlap quite a bit I spent some time thinking 
> >> about how we can approach these features and how to avoid 
> >> overlapping contributions.
> >>
> >> The challenge here is the following. Some of the Table API row 
> >> windows
> as
> >> defined by FLIP-11 [1] are basically SQL OVER windows while other 
> >> cannot be easily expressed as such (TumbleRows for row-count 
> >> intervals, SessionRows).
> >> However, since Calcite already supports SQL OVER windows, we can 
> >> reuse
> the
> >> optimization logic for some of the Table API row windows. I also 
> >> thought about the semantics of the TumbleRows and SessionRows 
> >> windows as defined in
> >> FLIP-11 and came to the conclusion that these are not well defined 
> >> in
> >> FLIP-11 and should rather be defined as SlideRows windows with a 
> >> special PARTITION BY clause.
> >>
> >> I propose to approach SQL OVER windows and Table API row windows as
> >> follows:
> >>
> >> We start with three simple cases for SQL OVER windows (not Table 
> >> API
> yet):
> >>
> >> * OVER RANGE for event time
> >> * OVER RANGE for processing time
> >> * OVER ROW for processing time
> >>
> >> All cases fulfill the following restrictions:
> >> - All aggregations in SELECT must refer to the same window.
> >> - PARTITION BY may not contain the rowtime attribute.
> >> - ORDER BY must be on rowtime attribute (for event time) or on a 
> >> marker function that indicates processing time. Additional sort 
> >> attributes are not supported initially.
> >> - only "BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "BETWEEN x 
> >> PRECEDING AND CURRENT ROW" are supported.
> >>
> >> OVER ROW for event time cannot be easily supported. With event 
> >> time, we may have late records which need to be injected into the 
> >> order of records.
> >> When
> >> a record in injected in to the order where a row-count window has
> already
> >> been computed, this and all following windows will change. We could
> either
> >> drop the record or sent out many retraction records. I think it is 
> >> best
> to
> >> not open this can of worms at this point.
> >>
> >> The rational for all of the above restrictions is to have first 
> >> versions of OVER windows soon.
> >> Once we have the above cases covered we can extend and remove
> limitations
> >> as follows:
> >>
> >> - Table API SlideRow windows (with the same restrictions as above). 
> >> This will be mostly API work since the execution part has been solved 
> >> before.
> >> - Add support for FOLLOWING (except UNBOUNDED FOLLOWING)
> >> - Add support for different windows in SELECT. All windows must be 
> >> partitioned and ordered in the same way.
> >> - Add support for additional ORDER BY attributes (besides time).
> >>
> >> As I said before, TumbleRows and SessionRows windows as in FLIP-11 
> >> are
> not
> >> well defined, IMO.
> >> They can be expressed as SlideRows windows with special 
> >> partitioning (partitioning on fixed, non-overlapping time ranges 
> >> for TumbleRows, and gap-separated, non-overlapping time ranges for 
> >> SessionRows) I would not start to work on those yet.
> >>
> >> I would like to close all related JIRA issues (FLINK-4678, 
> >> FLINK-4679, FLINK-4680, FLINK-5584) and restructure the development 
> >> of these
> features
> >> as outlined above with corresponding JIRA issues.
> >>
> >> What do others think? (I cc'ed the contributors assigned to the 
> >> above
> JIRA
> >> issues)
> >>
> >> Best, Fabian
> >>
> >> [1]
> >>
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> 11%3A+Table+API+Stream+Aggregations
> >>
> >
>

Reply via email to