+1 to "BEGIN STATEMENT SET; ... END;" syntax. I also think sync/async execution is orthogonal to statement set syntax. This problem still stand there for individual statements. We can discuss this in a separate thread.
Best, Jark On Mon, 22 Jun 2020 at 23:23, Fabian Hueske <fhue...@gmail.com> wrote: > Thanks for the discussion Godfrey and Timo, > > I like the syntax proposed by Jark and Timo: > > BEGIN STATEMENT SET; > INSERT INTO ...; > INSERT INTO ...; > END; > > (I didn't pay attention and didn't mean to propose START over BEGIN. I just > wanted to make the point that the syntax should make it clear that a > statement set is started). > > I think the important questions about streaming/batch queries and > sync/async execution need to be discussed and solved. > However, I think these points are orthogonal to the question about > supporting statement sets. > These issues exist today (without a SQL syntax for statement sets) and IMO > such a syntax doesn't make the situation any worse or better (assuming that > we agree on the limitation that all statements in a set are either > streaming or batch queries). > As I said before, from Flink's point of view a statement set can be > replaced by a single INSERT INTO query (either streaming or batch, > depending on the type of queries in the set). > > Best, Fabian > > > Am Mo., 22. Juni 2020 um 10:55 Uhr schrieb Timo Walther < > twal...@apache.org > >: > > > Hi Godfrey, > > > > 1) Of course we should have unified behavior for API and SQL file. > > However, this doesn't mean that `executeSql` needs to become blocking or > > support multi-statements. In a programmatic API, async is more useful as > > a user can control long running jobs (regardless of batch or streaming). > > Sync behavior can be expressed on an async API (e.g. > > TableResult.await()). If we support multi-statements in the API, it will > > not be supported through `executeSql`, this part of the API has been > > finalized in the last release. We need to come up with a new API method. > > > > 3) I think forcing async execution also for multiline batch queries in > > SQL can be future work. Either we enable those using a flag or special > > syntax in a SQL file. Or do we want this flecibility already in the > > first multi-statement support version? > > > > Regards, > > Timo > > > > On 17.06.20 15:27, godfrey he wrote: > > > Hi Fabian, Jack, Timo > > > > > > Thanks for the suggestions. > > > > > > Regarding the SQL syntax, BEGIN is more popular than START. I'm fine > with > > > the syntax Timo suggested. > > > > > > Regarding whether this should be implemented in Flink's SQL core. I > think > > > there are three things to consider: > > > > > > First one, do we need to unify the default behavior of API and sql > file? > > > The execution of `TableEnvironment#executeSql` method and > > > `StatementSet#execute` method is asynchronous > > > for both batch and streaming, which means these methods just submit the > > job > > > and then return a `TableResult`. > > > While for batch processing (e.g. hive, traditional databases), the > > default > > > behavior is sync mode. > > > So this behavior is different from the APIs. I think it's better we can > > > unify the default behavior. > > > > > > Second one, how to determine the execution behavior of each statement > in > > a > > > file which contains both > > > batch sql and streaming sql. Currently, we have a flag to tell the > > planner > > > that the TableEnvironment is > > > batch env or stream env which can determine the default behavior. We > want > > > to remove > > > the flag and unify the TableEnvironment in the future. Then > > > TableEnvironment can execute both > > > batch sql and streaming sql. Timo and I have a discussion about this on > > > slack: for DML & DQL, > > > if a statement has keywords like `EMIT STREAM`, it's streaming sql and > > will > > > be executed in async mode. > > > otherwise it's a batch sql and will be executed in sync mode. > > > > > > Three one, how to flexibly support execution mode switching for batch > > sql. > > > For streaming sql, all DMLs & DQLs should be in async mode because the > > job > > > may be never finished. > > > While for batch sql, I think both modes are needed. I know some > platforms > > > execute batch sql > > > in async mode, and then continuously monitor the job status. Do we need > > > introduce `set execute-mode=xx` command > > > or new sql syntax like `START SYNC EXECUTION` ? > > > > > > For sql-client or other projects, we can easily decide what behavior an > > app > > > can support. > > > Just as Jark said, many downstream projects have the same requirement > for > > > multiple statement support, > > > but they may have different execution behaviors. It's great if flink > can > > > support flexible execution modes. > > > Or Flink core just defines the syntax, provides parser and supports a > > > default execution mode. > > > The downstream projects can use the APIs and parsed results to decide > how > > > to execute a sql. > > > > > > Best, > > > Godfrey > > > > > > Timo Walther <twal...@apache.org> 于2020年6月17日周三 下午6:32写道: > > > > > >> Hi Fabian, > > >> > > >> thanks for the proposal. I agree that we should have consensus on the > > >> SQL syntax as well and thus finalize the concepts introduced in > FLIP-84. > > >> > > >> I would favor Jark's proposal. I would like to propose the following > > >> syntax: > > >> > > >> BEGIN STATEMENT SET; > > >> INSERT INTO ...; > > >> INSERT INTO ...; > > >> END; > > >> > > >> 1) BEGIN and END are commonly used for blocks in SQL. > > >> > > >> 2) We should not start mixing START/BEGIN for different kind of > blocks. > > >> Because that can also be confusing for users. There is no additional > > >> helpful semantic in using START over BEGIN. > > >> > > >> 3) Instead, we should rather parameterize the block statament with > > >> `STATEMENT SET` and keep the END of the block simple (also similar to > > >> CASE ... WHEN ... END). > > >> > > >> 4) If we look at Jark's example in SQL Server, the BEGIN is also > > >> parameterized by `BEGIN { TRAN | TRANSACTION }`. > > >> > > >> 5) Also in Java curly braces are used for both classes, methods, and > > >> loops for different purposes parameterized by the preceding code. > > >> > > >> Regards, > > >> Timo > > >> > > >> > > >> On 17.06.20 11:36, Fabian Hueske wrote: > > >>> Thanks for joining this discussion Jark! > > >>> > > >>> This feature is a bit different from BEGIN TRANSACTION / COMMIT and > > >> BEGIN / > > >>> END. > > >>> > > >>> The only commonality is that all three group multiple statements. > > >>> * BEGIN TRANSACTION / COMMIT creates a transactional context that > > >>> guarantees atomicity, consistency, and isolation. Statements and > > queries > > >>> are sequentially executed. > > >>> * BEGIN / END defines a block of statements just like curly braces ({ > > and > > >>> }) do in Java. The statements (which can also include variable > > >> definitions > > >>> and printing) are sequentially executed. > > >>> * A statement set defines a group of statements that are optimized > > >> together > > >>> and jointly executed at the same time, i.e., there is no sequence or > > >> order. > > >>> > > >>> A statement set (consisting of multiple INSERT INTO statements) > behaves > > >>> just like a single INSERT INTO statement. > > >>> Everywhere where an INSERT INTO statement can be executed, it should > be > > >>> possible to execute a statement set consisting of multiple INSERT > INTO > > >>> statements. > > >>> That's also why I think that statement sets are orthogonal to > > >>> multi-statement execution. > > >>> > > >>> As I said before, I'm happy to discuss syntax proposals for statement > > >> sets. > > >>> However, I think a BEGIN / END syntax for statement sets would > confuse > > >>> users who know this syntax from MySQL, SQL Server, or another DBMS. > > >>> > > >>> Thanks, > > >>> Fabian > > >>> > > >>> > > >>> Am Di., 16. Juni 2020 um 05:07 Uhr schrieb Jark Wu <imj...@gmail.com > >: > > >>> > > >>>> Hi Fabian, > > >>>> > > >>>> Thanks for starting this discussion. I think this is a very > important > > >>>> syntax to support file mode and multi-statement for SQL Client. > > >>>> I'm +1 to introduce a syntax to group SQL statements to execute > > >> together. > > >>>> > > >>>> As a reference, traditional database systems also have similar > syntax, > > >> such > > >>>> as "START/BEGIN TRANSACTION ... COMMIT" to group statements as a > > >>>> transaction [1], > > >>>> and also "BEGIN ... END" [2] [3] to group a set of SQL statements > that > > >>>> execute together. > > >>>> > > >>>> Maybe we can also use "BEGIN ... END" syntax which is much simpler? > > >>>> > > >>>> Regarding where to implement, I also prefer to have it in Flink SQL > > >> core, > > >>>> here are some reasons from my side: > > >>>> 1) I think many downstream projects (e.g Zeppelin) will have the > same > > >>>> requirement. It would be better to have it in core instead of > > >> reinventing > > >>>> the wheel by users. > > >>>> 2) Having it in SQL CLI means it is a standard syntax to support > > >> statement > > >>>> set in Flink. So I think it makes sense to have it in core too, > > >> otherwise, > > >>>> it looks like a broken feature. > > >>>> In 1.10, CREATE VIEW is only supported in SQL CLI, not > > supported in > > >>>> TableEnvironment, which confuses many users. > > >>>> 3) Currently, we are moving statement parsing to use sql-parser > > >>>> (FLINK-17728). Calcite has a good support for parsing > > multi-statements. > > >>>> It will be tricky to parse multi-statements only in SQL > Client. > > >>>> > > >>>> Best, > > >>>> Jark > > >>>> > > >>>> [1]: > > >>>> > > >>>> > > >> > > > https://docs.microsoft.com/en-us/sql/t-sql/language-elements/begin-transaction-transact-sql?view=sql-server-ver15 > > >>>> [2]: > > >>>> > > >>>> > > >> > > > https://www.sqlservertutorial.net/sql-server-stored-procedures/sql-server-begin-end/ > > >>>> [3]: https://dev.mysql.com/doc/refman/8.0/en/begin-end.html > > >>>> > > >>>> On Mon, 15 Jun 2020 at 20:50, Fabian Hueske <fhue...@gmail.com> > > wrote: > > >>>> > > >>>>> Hi everyone, > > >>>>> > > >>>>> FLIP-84 [1] added the concept of a "statement set" to group > multiple > > >>>> INSERT > > >>>>> INTO statements (SQL or Table API) together. The statements in a > > >>>> statement > > >>>>> set are jointly optimized and executed as a single Flink job. > > >>>>> > > >>>>> I would like to start a discussion about a SQL syntax to group > > multiple > > >>>>> INSERT INTO statements in a statement set. The use case would be to > > >>>> expose > > >>>>> the statement set feature to a solely text based client for Flink > SQL > > >>>> such > > >>>>> as Flink's SQL CLI [1]. > > >>>>> > > >>>>> During the discussion of FLIP-84, we had briefly talked about such > a > > >>>> syntax > > >>>>> [3]. > > >>>>> > > >>>>> START STATEMENT SET; > > >>>>> INSERT INTO ... SELECT ...; > > >>>>> INSERT INTO ... SELECT ...; > > >>>>> ... > > >>>>> END STATEMENT SET; > > >>>>> > > >>>>> We didn't follow up on this proposal, to keep the focus on the > > FLIP-84 > > >>>>> Table API changes and to not dive into a discussion about multiline > > SQL > > >>>>> query support [4]. > > >>>>> > > >>>>> While this feature is clearly based on multiple SQL queries, I > think > > it > > >>>> is > > >>>>> a bit different from what we usually understand as multiline SQL > > >> support. > > >>>>> That's because a statement set ends up to be a single Flink job. > > Hence, > > >>>>> there is no need on the Flink side to coordinate the execution of > > >>>> multiple > > >>>>> jobs (incl. the discussion about blocking or async execution of > > >> queries). > > >>>>> Flink would treat the queries in a STATEMENT SET as a single query. > > >>>>> > > >>>>> I would like to start a discussion about supporting the [START|END] > > >>>>> STATEMENT SET syntax (or a different syntax with equivalent > > semantics) > > >> in > > >>>>> Flink. > > >>>>> I don't have a strong preference whether this should be implemented > > in > > >>>>> Flink's SQL core or be a purely client side implementation in the > CLI > > >>>>> client. It would be good though to have parser support in Flink for > > >> this. > > >>>>> > > >>>>> What do others think? > > >>>>> > > >>>>> [1] > > >>>>> > > >>>> > > >> > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878 > > >>>>> [2] > > >>>>> > > >>>>> > > >>>> > > >> > > > https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html > > >>>>> [3] > > >>>>> > > >>>>> > > >>>> > > >> > > > https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#heading=h.al86t1h4ecuv > > >>>>> [4] > > >>>>> > > >>>>> > > >>>> > > >> > > > https://lists.apache.org/thread.html/rf494e227c47010c91583f90eeaf807d3a4c3eb59d105349afd5fdc31%40%3Cdev.flink.apache.org%3E > > >>>>> > > >>>> > > >>> > > >> > > >> > > > > > > > >