Re: [DISCUSS] Some thoughts about unify Stream SQL and Batch SQL grammer

Timo Walther Thu, 18 Aug 2016 08:19:35 -0700

Hi Jark,

sorry that I didn't wrote back earlier. I wanted to talk to Fabian firstabout this. In general, according to Calcite's plans, even SQL queriescontaining the "STREAM" keyword are regular standard SQL. In theory wecould omit the "STREAM" keyword as long as it is guaranteed that thegenerated logical plans look the same. So I'm not against having thesame grammar for both batch and streaming queries. However, I think weshould contribute code to Calcite if the logical representation is notthere already for operators we need. We need to research how far theCalcite development is. We can implement windows viauser-defined-function as it also done in Calcite streaming design document.

It would be very interesting for the upcoming design phase if you couldshow us how you implemented your Blink SQL. For instance, how do youdefine windows there?


Regards,
Timo


Am 18/08/16 um 16:34 schrieb Aljoscha Krettek:

Hi,

I personally would like it a lot if the SQL queries for batch andstream programs looked the same. With the decision to move the TableAPI on top of Calcite and also use the Calcite SQL parser Flink issomewhat tied to Calcite so I don't know whether we can add our ownwindow constructs and teach the parser to properly read them.

Maybe Fabian and Timo have more insights here since they worked on themove to Calcite.


Cheers,
Aljoscha

+Timo looping him in directly

On Tue, 16 Aug 2016 at 09:29 Jark Wu <wuchong...@alibaba-inc.com<mailto:wuchong...@alibaba-inc.com>> wrote:


    Hi,

    Currently, Flink use Calcite for SQL parsing. So we use the
    StreamSQL grammer proposed by Calcite[1] which we have to use the
    `STREAM` keyword in SQL. For example, `SELECT *
    FROM Orders` is a regular standard SQL and will be translated to a
    batch job. If you want to statement a stream job, you have add the
    `STREAM` keyword, `SELECT STREAM *
    FROM Orders`.

    I'm thinking of why do we distinguish between StreamSQL and
    BatchSQL grammer? We already have separate high-level API for
    batch(DataSet) and stream(DataStream). And we have a unified Table
    API for batch and stream (that's great!). Why do we have to
    separate them again in SQL?

    I hope we can manipulate stream data like a table. Such as `SELECT *
    FROM Orders`, if Orders is a table (or run in batch execution
    env), then it's a batch job. If Orders is a stream (or run in
    stream execution env), then it's a stream job. The grammer of
    StreamSQL and BatchSQL is totally the same. And that is what we
    did in Blink SQL.

    The benefits if we unify the grammar :

    1. Easy to use StreamSQL for anyone who knows regular SQL. There
    is no difference between StreamSQL and regular SQL.
    2. Not blocked by Calcite. Currently, Calcite StreamSQL is not
    fullly supported. Not support stream-to-stream JOIN, not support
    window aggregate, not support aggregate without window, etc. We
    may need to wait for calcite to support them before we start work.
    As they are supported by regular SQL besides window. We can
    implement window via user-defined-function. So if we can use
    regular SQL instead of StreamSQL, we can start to work it right
    now and not wait for Calcite.
    3. Blink SQL can merge back to community to accelerate Flink SQL
    evolving. Blink SQL has done most work of it. We implement
    UDF/UDTF/UDAF, aggregate with/without window, and stream-to-stream
    JOIN, and so on.
    4. Window also can work in batch job.

    Just my thoughts :)

    What do you think about this ?

    [1] https://calcite.apache.org/docs/stream.html

    - Jark Wu



--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr

Re: [DISCUSS] Some thoughts about unify Stream SQL and Batch SQL grammer

Reply via email to