Hi, 

Currently, Flink use Calcite for SQL parsing. So we use the StreamSQL grammer 
proposed by Calcite[1] which we have to use the `STREAM` keyword in SQL. For 
example, `SELECT *
FROM Orders` is a regular standard SQL and will be translated to a batch job. 
If you want to statement a stream job, you have add the `STREAM` keyword, 
`SELECT STREAM *
FROM Orders`.

I'm thinking of why do we distinguish between StreamSQL and BatchSQL grammer? 
We already have separate high-level API for batch(DataSet) and 
stream(DataStream). And we have a unified Table API for batch and stream 
(that's great!). Why do we have to separate them again in SQL?

I hope we can manipulate stream data like a table. Such as `SELECT *
FROM Orders`, if Orders is a table (or run in batch execution env), then it's a 
batch job. If Orders is a stream (or run in stream execution env), then it's a 
stream job. The grammer of StreamSQL and BatchSQL is totally the same. And that 
is what we did in Blink SQL.

The benefits if we unify the grammar : 

1. Easy to use StreamSQL for anyone who knows regular SQL. There is no 
difference between StreamSQL and regular SQL.
2. Not blocked by Calcite. Currently, Calcite StreamSQL is not fullly 
supported. Not support stream-to-stream JOIN, not support window aggregate, not 
support aggregate without window, etc. We may need to wait for calcite to 
support them before we start work. As they are supported by regular SQL besides 
window. We can implement window via user-defined-function. So if we can use 
regular SQL instead of StreamSQL, we can start to work it right now and not 
wait for Calcite.
3. Blink SQL can merge back to community to accelerate Flink SQL evolving. 
Blink SQL has done most work of it. We implement UDF/UDTF/UDAF, aggregate 
with/without window, and stream-to-stream JOIN, and so on. 
4. Window also can work in batch job.

Just my thoughts :) 

What do you think about this ?

[1] https://calcite.apache.org/docs/stream.html

- Jark Wu 

Reply via email to