Re: [DISCUSS] Some thoughts about unify Stream SQL and Batch SQL grammer

Timo Walther Tue, 23 Aug 2016 08:55:44 -0700

Hi Jark,

your design document looks very promising. It would be great if youcould contribute parts of your implementation back. E.g. UDTFs, UDAFs oreven your CROSS APPLY operator. We don't need a FLIP for every littlenew feature. So feel free to create a Jira issue, discuss it a littlebit and open a PR. I'm also in favor of following Calcite's syntax as itmight be used in other frameworks as well. The user should not learn newsyntax when coming from Storm, Samza etc.


Timo




Am 23/08/16 um 13:09 schrieb Jark Wu:

Hi Fabian, Timo,

Sorry for the late response.

Regarding Calcite’s StreamSQL syntax, what I concern is only the STREAM keyword 
and no agg-without-window. Which makes different syntax for streaming and 
static tables. I don’t think Flink should have a custom SQL syntax, but it’s 
better to have a consistent syntax for batch and streaming. Regarding window 
syntax , I think it’s good and reasonable to follow Calcite’s syntax. Actually, 
we implement Blink SQL Window following Calcite’s syntax[1].

In addition, I describe the Blink SQL design including UDF, UDTF, UDAF, Window 
in google doc[1]. Hope that can help for the upcoming Flink SQL design.

+1 for creating FLIP

[1] 
https://docs.google.com/document/d/15iVc1781dxYWm3loVQlESYvMAxEzbbuVFPZWBYuY1Ek


- Jark Wu

在 2016年8月23日，下午3:47，Fabian Hueske <fhue...@gmail.com> 写道：

Hi,

I did a bit of prototyping yesterday to check to what extend Calcite
supports window operations on streams if we would implement them for the
Table API.
For the Table API we do not go through Calcite's SQL parser and validator,
but generate the logical plan (tree of RelNodes) ourselves mostly using
Calcite's Relbuilder.
It turns out that Calcite does not restrict grouped aggregations on streams
at this abstraction level, i.e., it does not perform any checks.

I think it should be possible to implement windowed aggregates for the
Table API. Once CALCITE-1345 [1] is implemented (and released), windowed
aggregates are also supported by the SQL parser, validator, and optimizer.
In order to make them work with our implementation we would need to adapt
our solution to it (only internally), but I am sure we could reuse a lot of
our initial implementation (Table API, validation, execution).

I drafted an API proposal a few months ago [2] and could convert this into
a FLIP to discuss the API and break it down into subtasks.

What do you think?

Cheers, Fabian

[1] https://issues.apache.org/jira/browse/CALCITE-1345
[2]
https://docs.google.com/document/d/19kSOAOINKCSWLBCKRq2WvNtmuaA9o3AyCh2ePqr3V5E

2016-08-19 11:04 GMT+02:00 Fabian Hueske <fhue...@gmail.com>:

Hi Jark,

thanks for starting this discussion. Actually, I think we are rather
"blocked" on the internal handling of streaming windows in Calcite than the
SQL parser. IMO, it should be possible to exchange or modify the parser if
we want that.

Regarding Calcite's StreamSQL syntax: Except for the STREAM keyword,
Calcite closely follows the SQL standard (e.g.,no special keywords like
WINDOW. Instead stream specific aspects like tumbling windows are done as
functions such as TUMBLE [1]). One main motivation of the Calcite community
is to have the same syntax for streaming and static tables. This includes
support for tables which are static and streaming at the same time (the
example of [1] is a table about orders to which new order records are
added). When querying such a table, the STREAM keyword is required to
distinguish the cases of a batch query which returns a result set and a
standing query which returns a result stream. In the context of Flink we
can can do the distinction using the type of the TableEnvironment. So we
could use the batch parser, but would need to change a couple things
internally and add checks for proper grouping on the timestamp column when
doing windows, etc. So far the discussion about the StreamSQL syntax rather
focused on the question whether 1) StreamSQL should follow the SQL standard
(as Calcite proposes) or 2) whether Flink should use a custom syntax with
stream specific features. For instance a tumbling window is expressed in
the GROUP BY clause [1] when following standard SQL but it could be defined
using a special WINDOW keyword in a custom StreamSQL dialect.

You are right that we have a dependency on Calcite. However, I think this
dependency is rather in the internals than the parser, i.e., how does the
validator/optimizer support and handle monotone / quasi-monotone attributes
and windows. I am not sure how much is already supported but the Calcite
community is working on this [2]. I think we need these features in Calcite
unless we want to completely remove our dependency on Calcite for
StreamSQL. I would not be in favor of removing Calcite at this point. We
put a lot of effort into refactoring the Table API internals. Instead we
should start to talk to the Calcite community and see how far they are,
what is missing, and how we can help.

I will start a discussion on the Calcite dev mailing list in the next days
and ask about the status of StreamSQL.

Best,
Fabian

[1] http://calcite.apache.org/docs/stream.html#tumbling-windows-improved
[2] https://issues.apache.org/jira/browse/CALCITE-1345



--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr

Re: [DISCUSS] Some thoughts about unify Stream SQL and Batch SQL grammer

Reply via email to