Hi Fabian,

You are right, the main thing we need to change for removing STREAM keyword is 
the table registration. If you would like, I can do a prototype.

Hi Timo, 

I’m glad to contribute our work back to Flink. I will look into it and create 
JIRAs next days. 

- Jark Wu 

> 在 2016年8月24日,上午12:13,Fabian Hueske <fhue...@gmail.com> 写道:
> 
> Hi Jark,
> 
> We can think about removing the STREAM keyword or not. In principle,
> Calcite should allow the same windowing syntax on streaming and static
> tables (this is one of the main goals of Calcite). The Table API can also
> distinguish stream and batch without the STREAM keyword by looking at the
> ExecutionEnvironment.
> I think we would need to change the way that tables are registered in
> Calcite's catalog and also add more validation (check that time windows
> refer to a time column, etc).
> A prototype should help to see what the consequence of removing the STREAM
> keyword (which is actually, changing the table registration, the parser is
> the same) would be.
> 
> Regarding streaming aggregates without window definition: We can certainly
> implement this feature in the Table API. There are a few points that need
> to be considered like value expiration after a certain time of update
> inactivity (otherwise the state might grow infinitely). But these aspects
> should be rather easy to solve. I think for SQL, such running aggregates
> are a special case of the Sliding Windows as discussed in Calcite's
> StreamSQL document [1].
> 
> Thanks also for the document! I'll take that into account when sketching
> the FLIP for streaming aggregation support.
> 
> Cheers, Fabian
> 
> [1] http://calcite.apache.org/docs/stream.html#sliding-windows
> 
> 2016-08-23 13:09 GMT+02:00 Jark Wu <wuchong...@alibaba-inc.com>:
> 
>> Hi Fabian, Timo,
>> 
>> Sorry for the late response.
>> 
>> Regarding Calcite’s StreamSQL syntax, what I concern is only the STREAM
>> keyword and no agg-without-window. Which makes different syntax for
>> streaming and static tables. I don’t think Flink should have a custom SQL
>> syntax, but it’s better to have a consistent syntax for batch and
>> streaming. Regarding window syntax , I think it’s good and reasonable to
>> follow Calcite’s syntax. Actually, we implement Blink SQL Window following
>> Calcite’s syntax[1].
>> 
>> In addition, I describe the Blink SQL design including UDF, UDTF, UDAF,
>> Window in google doc[1]. Hope that can help for the upcoming Flink SQL
>> design.
>> 
>> +1 for creating FLIP
>> 
>> [1] https://docs.google.com/document/d/15iVc1781dxYWm3loVQlESYvMAxEzb
>> buVFPZWBYuY1Ek
>> 
>> 
>> - Jark Wu
>> 
>>> 在 2016年8月23日,下午3:47,Fabian Hueske <fhue...@gmail.com> 写道:
>>> 
>>> Hi,
>>> 
>>> I did a bit of prototyping yesterday to check to what extend Calcite
>>> supports window operations on streams if we would implement them for the
>>> Table API.
>>> For the Table API we do not go through Calcite's SQL parser and
>> validator,
>>> but generate the logical plan (tree of RelNodes) ourselves mostly using
>>> Calcite's Relbuilder.
>>> It turns out that Calcite does not restrict grouped aggregations on
>> streams
>>> at this abstraction level, i.e., it does not perform any checks.
>>> 
>>> I think it should be possible to implement windowed aggregates for the
>>> Table API. Once CALCITE-1345 [1] is implemented (and released), windowed
>>> aggregates are also supported by the SQL parser, validator, and
>> optimizer.
>>> In order to make them work with our implementation we would need to adapt
>>> our solution to it (only internally), but I am sure we could reuse a lot
>> of
>>> our initial implementation (Table API, validation, execution).
>>> 
>>> I drafted an API proposal a few months ago [2] and could convert this
>> into
>>> a FLIP to discuss the API and break it down into subtasks.
>>> 
>>> What do you think?
>>> 
>>> Cheers, Fabian
>>> 
>>> [1] https://issues.apache.org/jira/browse/CALCITE-1345
>>> [2]
>>> https://docs.google.com/document/d/19kSOAOINKCSWLBCKRq2WvNtmuaA9o
>> 3AyCh2ePqr3V5E
>>> 
>>> 2016-08-19 11:04 GMT+02:00 Fabian Hueske <fhue...@gmail.com>:
>>> 
>>>> Hi Jark,
>>>> 
>>>> thanks for starting this discussion. Actually, I think we are rather
>>>> "blocked" on the internal handling of streaming windows in Calcite than
>> the
>>>> SQL parser. IMO, it should be possible to exchange or modify the parser
>> if
>>>> we want that.
>>>> 
>>>> Regarding Calcite's StreamSQL syntax: Except for the STREAM keyword,
>>>> Calcite closely follows the SQL standard (e.g.,no special keywords like
>>>> WINDOW. Instead stream specific aspects like tumbling windows are done
>> as
>>>> functions such as TUMBLE [1]). One main motivation of the Calcite
>> community
>>>> is to have the same syntax for streaming and static tables. This
>> includes
>>>> support for tables which are static and streaming at the same time (the
>>>> example of [1] is a table about orders to which new order records are
>>>> added). When querying such a table, the STREAM keyword is required to
>>>> distinguish the cases of a batch query which returns a result set and a
>>>> standing query which returns a result stream. In the context of Flink we
>>>> can can do the distinction using the type of the TableEnvironment. So we
>>>> could use the batch parser, but would need to change a couple things
>>>> internally and add checks for proper grouping on the timestamp column
>> when
>>>> doing windows, etc. So far the discussion about the StreamSQL syntax
>> rather
>>>> focused on the question whether 1) StreamSQL should follow the SQL
>> standard
>>>> (as Calcite proposes) or 2) whether Flink should use a custom syntax
>> with
>>>> stream specific features. For instance a tumbling window is expressed in
>>>> the GROUP BY clause [1] when following standard SQL but it could be
>> defined
>>>> using a special WINDOW keyword in a custom StreamSQL dialect.
>>>> 
>>>> You are right that we have a dependency on Calcite. However, I think
>> this
>>>> dependency is rather in the internals than the parser, i.e., how does
>> the
>>>> validator/optimizer support and handle monotone / quasi-monotone
>> attributes
>>>> and windows. I am not sure how much is already supported but the Calcite
>>>> community is working on this [2]. I think we need these features in
>> Calcite
>>>> unless we want to completely remove our dependency on Calcite for
>>>> StreamSQL. I would not be in favor of removing Calcite at this point. We
>>>> put a lot of effort into refactoring the Table API internals. Instead we
>>>> should start to talk to the Calcite community and see how far they are,
>>>> what is missing, and how we can help.
>>>> 
>>>> I will start a discussion on the Calcite dev mailing list in the next
>> days
>>>> and ask about the status of StreamSQL.
>>>> 
>>>> Best,
>>>> Fabian
>>>> 
>>>> [1] http://calcite.apache.org/docs/stream.html#tumbling-
>> windows-improved
>>>> [2] https://issues.apache.org/jira/browse/CALCITE-1345
>>>> 
>> 
>> 

Reply via email to