Re: [DISCUSS] SQL Syntax for Table API StatementSet

Jark Wu Mon, 22 Jun 2020 08:58:04 -0700

+1 to "BEGIN STATEMENT SET; ... END;" syntax.

I also think sync/async execution is orthogonal to statement set syntax.
This problem still stand there for individual statements.
We can discuss this in a separate thread.


Best,
Jark

On Mon, 22 Jun 2020 at 23:23, Fabian Hueske <fhue...@gmail.com> wrote:

> Thanks for the discussion Godfrey and Timo,
>
> I like the syntax proposed by Jark and Timo:
>
> BEGIN STATEMENT SET;
>    INSERT INTO ...;
>    INSERT INTO ...;
> END;
>
> (I didn't pay attention and didn't mean to propose START over BEGIN. I just
> wanted to make the point that the syntax should make it clear that a
> statement set is started).
>
> I think the important questions about streaming/batch queries and
> sync/async execution need to be discussed and solved.
> However, I think these points are orthogonal to the question about
> supporting statement sets.
> These issues exist today (without a SQL syntax for statement sets) and IMO
> such a syntax doesn't make the situation any worse or better (assuming that
> we agree on the limitation that all statements in a set are either
> streaming or batch queries).
> As I said before, from Flink's point of view a statement set can be
> replaced by a single INSERT INTO query (either streaming or batch,
> depending on the type of queries in the set).
>
> Best, Fabian
>
>
> Am Mo., 22. Juni 2020 um 10:55 Uhr schrieb Timo Walther <
> twal...@apache.org
> >:
>
> > Hi Godfrey,
> >
> > 1) Of course we should have unified behavior for API and SQL file.
> > However, this doesn't mean that `executeSql` needs to become blocking or
> > support multi-statements. In a programmatic API, async is more useful as
> > a user can control long running jobs (regardless of batch or streaming).
> > Sync behavior can be expressed on an async API (e.g.
> > TableResult.await()). If we support multi-statements in the API, it will
> > not be supported through `executeSql`, this part of the API has been
> > finalized in the last release. We need to come up with a new API method.
> >
> > 3) I think forcing async execution also for multiline batch queries in
> > SQL can be future work. Either we enable those using a flag or special
> > syntax in a SQL file. Or do we want this flecibility already in the
> > first multi-statement support version?
> >
> > Regards,
> > Timo
> >
> > On 17.06.20 15:27, godfrey he wrote:
> > > Hi Fabian, Jack, Timo
> > >
> > > Thanks for the suggestions.
> > >
> > > Regarding the SQL syntax, BEGIN is more popular than START. I'm fine
> with
> > > the syntax Timo suggested.
> > >
> > > Regarding whether this should be implemented in Flink's SQL core. I
> think
> > > there are three things to consider:
> > >
> > > First one, do we need to unify the default behavior of API and sql
> file?
> > > The execution of `TableEnvironment#executeSql` method and
> > > `StatementSet#execute` method is asynchronous
> > > for both batch and streaming, which means these methods just submit the
> > job
> > > and then return a `TableResult`.
> > >   While for batch processing (e.g. hive, traditional databases), the
> > default
> > > behavior is sync mode.
> > > So this behavior is different from the APIs. I think it's better we can
> > > unify the default behavior.
> > >
> > > Second one, how to determine the execution behavior of each statement
> in
> > a
> > > file which contains both
> > > batch sql and streaming sql. Currently, we have a flag to tell the
> > planner
> > > that the TableEnvironment is
> > > batch env or stream env which can determine the default behavior. We
> want
> > > to remove
> > > the flag and unify the TableEnvironment in the future. Then
> > > TableEnvironment can execute both
> > > batch sql and streaming sql. Timo and I have a discussion about this on
> > > slack: for DML & DQL,
> > > if a statement has keywords like `EMIT STREAM`, it's streaming sql and
> > will
> > > be executed in async mode.
> > > otherwise it's a batch sql and will be executed in sync mode.
> > >
> > > Three one, how to flexibly support execution mode switching for batch
> > sql.
> > > For streaming sql, all DMLs & DQLs should be in async mode because the
> > job
> > > may be never finished.
> > > While for batch sql, I think both modes are needed. I know some
> platforms
> > > execute batch sql
> > > in async mode, and then continuously monitor the job status. Do we need
> > > introduce `set execute-mode=xx` command
> > >   or new sql syntax like `START SYNC EXECUTION` ?
> > >
> > > For sql-client or other projects, we can easily decide what behavior an
> > app
> > > can support.
> > > Just as Jark said, many downstream projects have the same requirement
> for
> > > multiple statement support,
> > > but they may have different execution behaviors. It's great if flink
> can
> > > support flexible execution modes.
> > > Or Flink core just defines the syntax, provides parser and supports a
> > > default execution mode.
> > > The downstream projects can use the APIs and parsed results to decide
> how
> > > to execute a sql.
> > >
> > > Best,
> > > Godfrey
> > >
> > > Timo Walther <twal...@apache.org> 于2020年6月17日周三 下午6:32写道：
> > >
> > >> Hi Fabian,
> > >>
> > >> thanks for the proposal. I agree that we should have consensus on the
> > >> SQL syntax as well and thus finalize the concepts introduced in
> FLIP-84.
> > >>
> > >> I would favor Jark's proposal. I would like to propose the following
> > >> syntax:
> > >>
> > >> BEGIN STATEMENT SET;
> > >>     INSERT INTO ...;
> > >>     INSERT INTO ...;
> > >> END;
> > >>
> > >> 1) BEGIN and END are commonly used for blocks in SQL.
> > >>
> > >> 2) We should not start mixing START/BEGIN for different kind of
> blocks.
> > >> Because that can also be confusing for users. There is no additional
> > >> helpful semantic in using START over BEGIN.
> > >>
> > >> 3) Instead, we should rather parameterize the block statament with
> > >> `STATEMENT SET` and keep the END of the block simple (also similar to
> > >> CASE ... WHEN ... END).
> > >>
> > >> 4) If we look at Jark's example in SQL Server, the BEGIN is also
> > >> parameterized by `BEGIN { TRAN | TRANSACTION }`.
> > >>
> > >> 5) Also in Java curly braces are used for both classes, methods, and
> > >> loops for different purposes parameterized by the preceding code.
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >>
> > >> On 17.06.20 11:36, Fabian Hueske wrote:
> > >>> Thanks for joining this discussion Jark!
> > >>>
> > >>> This feature is a bit different from BEGIN TRANSACTION / COMMIT and
> > >> BEGIN /
> > >>> END.
> > >>>
> > >>> The only commonality is that all three group multiple statements.
> > >>> * BEGIN TRANSACTION / COMMIT creates a transactional context that
> > >>> guarantees atomicity, consistency, and isolation. Statements and
> > queries
> > >>> are sequentially executed.
> > >>> * BEGIN / END defines a block of statements just like curly braces ({
> > and
> > >>> }) do in Java. The statements (which can also include variable
> > >> definitions
> > >>> and printing) are sequentially executed.
> > >>> * A statement set defines a group of statements that are optimized
> > >> together
> > >>> and jointly executed at the same time, i.e., there is no sequence or
> > >> order.
> > >>>
> > >>> A statement set (consisting of multiple INSERT INTO statements)
> behaves
> > >>> just like a single INSERT INTO statement.
> > >>> Everywhere where an INSERT INTO statement can be executed, it should
> be
> > >>> possible to execute a statement set consisting of multiple INSERT
> INTO
> > >>> statements.
> > >>> That's also why I think that statement sets are orthogonal to
> > >>> multi-statement execution.
> > >>>
> > >>> As I said before, I'm happy to discuss syntax proposals for statement
> > >> sets.
> > >>> However, I think a BEGIN / END syntax for statement sets would
> confuse
> > >>> users who know this syntax from MySQL, SQL Server, or another DBMS.
> > >>>
> > >>> Thanks,
> > >>> Fabian
> > >>>
> > >>>
> > >>> Am Di., 16. Juni 2020 um 05:07 Uhr schrieb Jark Wu <imj...@gmail.com
> >:
> > >>>
> > >>>> Hi Fabian,
> > >>>>
> > >>>> Thanks for starting this discussion. I think this is a very
> important
> > >>>> syntax to support file mode and multi-statement for SQL Client.
> > >>>> I'm +1 to introduce a syntax to group SQL statements to execute
> > >> together.
> > >>>>
> > >>>> As a reference, traditional database systems also have similar
> syntax,
> > >> such
> > >>>> as "START/BEGIN TRANSACTION ... COMMIT" to group statements as a
> > >>>> transaction [1],
> > >>>> and also "BEGIN ... END" [2] [3] to group a set of SQL statements
> that
> > >>>> execute together.
> > >>>>
> > >>>> Maybe we can also use "BEGIN ... END" syntax which is much simpler?
> > >>>>
> > >>>> Regarding where to implement, I also prefer to have it in Flink SQL
> > >> core,
> > >>>> here are some reasons from my side:
> > >>>> 1) I think many downstream projects (e.g Zeppelin) will have the
> same
> > >>>> requirement. It would be better to have it in core instead of
> > >> reinventing
> > >>>> the wheel by users.
> > >>>> 2) Having it in SQL CLI means it is a standard syntax to support
> > >> statement
> > >>>> set in Flink. So I think it makes sense to have it in core too,
> > >> otherwise,
> > >>>> it looks like a broken feature.
> > >>>>       In 1.10, CREATE VIEW is only supported in SQL CLI, not
> > supported in
> > >>>> TableEnvironment, which confuses many users.
> > >>>> 3) Currently, we are moving statement parsing to use sql-parser
> > >>>> (FLINK-17728). Calcite has a good support for parsing
> > multi-statements.
> > >>>>       It will be tricky to parse multi-statements only in SQL
> Client.
> > >>>>
> > >>>> Best,
> > >>>> Jark
> > >>>>
> > >>>> [1]:
> > >>>>
> > >>>>
> > >>
> >
> https://docs.microsoft.com/en-us/sql/t-sql/language-elements/begin-transaction-transact-sql?view=sql-server-ver15
> > >>>> [2]:
> > >>>>
> > >>>>
> > >>
> >
> https://www.sqlservertutorial.net/sql-server-stored-procedures/sql-server-begin-end/
> > >>>> [3]: https://dev.mysql.com/doc/refman/8.0/en/begin-end.html
> > >>>>
> > >>>> On Mon, 15 Jun 2020 at 20:50, Fabian Hueske <fhue...@gmail.com>
> > wrote:
> > >>>>
> > >>>>> Hi everyone,
> > >>>>>
> > >>>>> FLIP-84 [1] added the concept of a "statement set" to group
> multiple
> > >>>> INSERT
> > >>>>> INTO statements (SQL or Table API) together. The statements in a
> > >>>> statement
> > >>>>> set are jointly optimized and executed as a single Flink job.
> > >>>>>
> > >>>>> I would like to start a discussion about a SQL syntax to group
> > multiple
> > >>>>> INSERT INTO statements in a statement set. The use case would be to
> > >>>> expose
> > >>>>> the statement set feature to a solely text based client for Flink
> SQL
> > >>>> such
> > >>>>> as Flink's SQL CLI [1].
> > >>>>>
> > >>>>> During the discussion of FLIP-84, we had briefly talked about such
> a
> > >>>> syntax
> > >>>>> [3].
> > >>>>>
> > >>>>> START STATEMENT SET;
> > >>>>> INSERT INTO ... SELECT ...;
> > >>>>> INSERT INTO ... SELECT ...;
> > >>>>> ...
> > >>>>> END STATEMENT SET;
> > >>>>>
> > >>>>> We didn't follow up on this proposal, to keep the focus on the
> > FLIP-84
> > >>>>> Table API changes and to not dive into a discussion about multiline
> > SQL
> > >>>>> query support [4].
> > >>>>>
> > >>>>> While this feature is clearly based on multiple SQL queries, I
> think
> > it
> > >>>> is
> > >>>>> a bit different from what we usually understand as multiline SQL
> > >> support.
> > >>>>> That's because a statement set ends up to be a single Flink job.
> > Hence,
> > >>>>> there is no need on the Flink side to coordinate the execution of
> > >>>> multiple
> > >>>>> jobs (incl. the discussion about blocking or async execution of
> > >> queries).
> > >>>>> Flink would treat the queries in a STATEMENT SET as a single query.
> > >>>>>
> > >>>>> I would like to start a discussion about supporting the [START|END]
> > >>>>> STATEMENT SET syntax (or a different syntax with equivalent
> > semantics)
> > >> in
> > >>>>> Flink.
> > >>>>> I don't have a strong preference whether this should be implemented
> > in
> > >>>>> Flink's SQL core or be a purely client side implementation in the
> CLI
> > >>>>> client. It would be good though to have parser support in Flink for
> > >> this.
> > >>>>>
> > >>>>> What do others think?
> > >>>>>
> > >>>>> [1]
> > >>>>>
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > >>>>> [2]
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html
> > >>>>> [3]
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#heading=h.al86t1h4ecuv
> > >>>>> [4]
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://lists.apache.org/thread.html/rf494e227c47010c91583f90eeaf807d3a4c3eb59d105349afd5fdc31%40%3Cdev.flink.apache.org%3E
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >
> >
> >
>

Re: [DISCUSS] SQL Syntax for Table API StatementSet

Reply via email to