Hi Jark,

sorry that I didn't wrote back earlier. I wanted to talk to Fabian first about this. In general, according to Calcite's plans, even SQL queries containing the "STREAM" keyword are regular standard SQL. In theory we could omit the "STREAM" keyword as long as it is guaranteed that the generated logical plans look the same. So I'm not against having the same grammar for both batch and streaming queries. However, I think we should contribute code to Calcite if the logical representation is not there already for operators we need. We need to research how far the Calcite development is. We can implement windows via user-defined-function as it also done in Calcite streaming design document.

It would be very interesting for the upcoming design phase if you could show us how you implemented your Blink SQL. For instance, how do you define windows there?

Regards,
Timo


Am 18/08/16 um 16:34 schrieb Aljoscha Krettek:
Hi,
I personally would like it a lot if the SQL queries for batch and stream programs looked the same. With the decision to move the Table API on top of Calcite and also use the Calcite SQL parser Flink is somewhat tied to Calcite so I don't know whether we can add our own window constructs and teach the parser to properly read them.

Maybe Fabian and Timo have more insights here since they worked on the move to Calcite.

Cheers,
Aljoscha

+Timo looping him in directly

On Tue, 16 Aug 2016 at 09:29 Jark Wu <wuchong...@alibaba-inc.com <mailto:wuchong...@alibaba-inc.com>> wrote:

    Hi,

    Currently, Flink use Calcite for SQL parsing. So we use the
    StreamSQL grammer proposed by Calcite[1] which we have to use the
    `STREAM` keyword in SQL. For example, `SELECT *
    FROM Orders` is a regular standard SQL and will be translated to a
    batch job. If you want to statement a stream job, you have add the
    `STREAM` keyword, `SELECT STREAM *
    FROM Orders`.

    I'm thinking of why do we distinguish between StreamSQL and
    BatchSQL grammer? We already have separate high-level API for
    batch(DataSet) and stream(DataStream). And we have a unified Table
    API for batch and stream (that's great!). Why do we have to
    separate them again in SQL?

    I hope we can manipulate stream data like a table. Such as `SELECT *
    FROM Orders`, if Orders is a table (or run in batch execution
    env), then it's a batch job. If Orders is a stream (or run in
    stream execution env), then it's a stream job. The grammer of
    StreamSQL and BatchSQL is totally the same. And that is what we
    did in Blink SQL.

    The benefits if we unify the grammar :

    1. Easy to use StreamSQL for anyone who knows regular SQL. There
    is no difference between StreamSQL and regular SQL.
    2. Not blocked by Calcite. Currently, Calcite StreamSQL is not
    fullly supported. Not support stream-to-stream JOIN, not support
    window aggregate, not support aggregate without window, etc. We
    may need to wait for calcite to support them before we start work.
    As they are supported by regular SQL besides window. We can
    implement window via user-defined-function. So if we can use
    regular SQL instead of StreamSQL, we can start to work it right
    now and not wait for Calcite.
    3. Blink SQL can merge back to community to accelerate Flink SQL
    evolving. Blink SQL has done most work of it. We implement
    UDF/UDTF/UDAF, aggregate with/without window, and stream-to-stream
    JOIN, and so on.
    4. Window also can work in batch job.

    Just my thoughts :)

    What do you think about this ?

    [1] https://calcite.apache.org/docs/stream.html

    - Jark Wu



--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr

Reply via email to