Hi Timo, I will respond some of the questions:
1) SQL client specific options Whether it starts with "table" or "sql-client" depends on where the configuration takes effect. If it is a table configuration, we should make clear what's the behavior when users change the configuration in the lifecycle of TableEnvironment. I agree with Shengkai `sql-client.planner` and `sql-client.execution.mode` are something special that can't be changed after TableEnvironment has been initialized. You can see `StreamExecutionEnvironment` provides `configure()` method to override configuration after StreamExecutionEnvironment has been initialized. Therefore, I think it would be better to still use `sql-client.planner` and `sql-client.execution.mode`. 2) Execution file >From my point of view, there is a big difference between `sql-client.job.detach` and `TableEnvironment.executeMultiSql()` that `sql-client.job.detach` will affect every single DML statement in the terminal, not only the statements in SQL files. I think the single DML statement in the interactive terminal is something like tEnv#executeSql() instead of tEnv#executeMultiSql. So I don't like the "multi" and "sql" keyword in `table.multi-sql-async`. I just find that runtime provides a configuration called "execution.attached" [1] which is false by default which specifies if the pipeline is submitted in attached or detached mode. It provides exactly the same functionality of `sql-client.job.detach`. What do you think about using this option? If we also want to support this config in TableEnvironment, I think it should also affect the DML execution of `tEnv#executeSql()`, not only DMLs in `tEnv#executeMultiSql()`. Therefore, the behavior may look like this: val tableResult = tEnv.executeSql("INSERT INTO ...") ==> async by default tableResult.await() ==> manually block until finish tEnv.getConfig().getConfiguration().setString("execution.attached", "true") val tableResult2 = tEnv.executeSql("INSERT INTO ...") ==> sync, don't need to wait on the TableResult tEnv.executeMultiSql( """ CREATE TABLE .... ==> always sync INSERT INTO ... => sync, because we set configuration above SET execution.attached = false; INSERT INTO ... => async """) On the other hand, I think `sql-client.job.detach` and `TableEnvironment.executeMultiSql()` should be two separate topics, as Shengkai mentioned above, SQL CLI only depends on `TableEnvironment#executeSql()` to support multi-line statements. I'm fine with making `executeMultiSql()` clear but don't want it to block this FLIP, maybe we can discuss this in another thread. Best, Jark [1]: https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fskm...@gmail.com> wrote: > Hi, Timo. > Thanks for your detailed feedback. I have some thoughts about your > feedback. > > *Regarding #1*: I think the main problem is whether the table environment > has the ability to update itself. Let's take a simple program as an > example. > > > ``` > TableEnvironment tEnv = TableEnvironment.create(...); > > tEnv.getConfig.getConfiguration.setString("table.planner", "old"); > > > tEnv.executeSql("..."); > > ``` > > If we regard this option as a table option, users don't have to create > another table environment manually. In that case, tEnv needs to check > whether the current mode and planner are the same as before when executeSql > or explainSql. I don't think it's easy work for the table environment, > especially if users have a StreamExecutionEnvironment but set old planner > and batch mode. But when we make this option as a sql client option, users > only use the SET command to change the setting. We can rebuild a new table > environment when set successes. > > > *Regarding #2*: I think we need to discuss the implementation before > continuing this topic. In the sql client, we will maintain two parsers. The > first parser(client parser) will only match the sql client commands. If the > client parser can't parse the statement, we will leverage the power of the > table environment to execute. According to our blueprint, > TableEnvironment#executeSql is enough for the sql client. Therefore, > TableEnvironment#executeMultiSql is out-of-scope for this FLIP. > > But if we need to introduce the `TableEnvironment.executeMultiSql` in the > future, I think it's OK to use the option `table.multi-sql-async` rather > than option `sql-client.job.detach`. But we think the name is not suitable > because the name is confusing for others. When setting the option false, we > just mean it will block the execution of the INSERT INTO statement, not DDL > or others(other sql statements are always executed synchronously). So how > about `table.job.async`? It only works for the sql-client and the > executeMultiSql. If we set this value false, the table environment will > return the result until the job finishes. > > > *Regarding #3, #4*: I still think we should use DELETE JAR and LIST JAR > because HIVE also uses these commands to add the jar into the classpath or > delete the jar. If we use such commands, it can reduce our work for hive > compatibility. > > For SHOW JAR, I think the main concern is the jars are not maintained by > the Catalog. If we really needs to keep consistent with SQL grammar, maybe > we should use > > `ADD JAR` -> `CREATE JAR`, > `DELETE JAR` -> `DROP JAR`, > `LIST JAR` -> `SHOW JAR`. > > *Regarding #5*: I agree with you that we'd better keep consistent. > > *Regarding #6*: Yes. Most of the commands should belong to the table > environment. In the Summary section, I use the <NOTE> tag to identify which > commands should belong to the sql client and which commands should belong > to the table environment. I also add a new section about implementation > details in the FLIP. > > Best, > Shengkai > > Timo Walther <twal...@apache.org> 于2021年2月2日周二 下午6:43写道: > > > Thanks for this great proposal Shengkai. This will give the SQL Client a > > very good update and make it production ready. > > > > Here is some feedback from my side: > > > > 1) SQL client specific options > > > > I don't think that `sql-client.planner` and `sql-client.execution.mode` > > are SQL Client specific. Similar to `StreamExecutionEnvironment` and > > `ExecutionConfig#configure` that have been added recently, we should > > offer a possibility for TableEnvironment. How about we offer > > `TableEnvironment.create(ReadableConfig)` and add a `table.planner` and > > `table.execution-mode` to > > `org.apache.flink.table.api.config.TableConfigOptions`? > > > > 2) Execution file > > > > Did you have a look at the Appendix of FLIP-84 [1] including the mailing > > list thread at that time? Could you further elaborate how the > > multi-statement execution should work for a unified batch/streaming > > story? According to our past discussions, each line in an execution file > > should be executed blocking which means a streaming query needs a > > statement set to execute multiple INSERT INTO statement, correct? We > > should also offer this functionality in > > `TableEnvironment.executeMultiSql()`. Whether `sql-client.job.detach` is > > SQL Client specific needs to be determined, it could also be a general > > `table.multi-sql-async` option? > > > > 3) DELETE JAR > > > > Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like one is > > actively deleting the JAR in the corresponding path. > > > > 4) LIST JAR > > > > This should be `SHOW JARS` according to other SQL commands such as `SHOW > > CATALOGS`, `SHOW TABLES`, etc. [2]. > > > > 5) EXPLAIN [ExplainDetail[, ExplainDetail]*] > > > > We should keep the details in sync with > > `org.apache.flink.table.api.ExplainDetail` and avoid confusion about > > differently named ExplainDetails. I would vote for `ESTIMATED_COST` > > instead of `COST`. I'm sure the original author had a reason why to call > > it that way. > > > > 6) Implementation details > > > > It would be nice to understand how we plan to implement the given > > features. Most of the commands and config options should go into > > TableEnvironment and SqlParser directly, correct? This way users have a > > unified way of using Flink SQL. TableEnvironment would provide a similar > > user experience in notebooks or interactive programs than the SQL Client. > > > > [1] > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878 > > [2] > > > > > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html > > > > Regards, > > Timo > > > > > > On 02.02.21 10:13, Shengkai Fang wrote: > > > Sorry for the typo. I mean `RESET` is much better rather than `UNSET`. > > > > > > Shengkai Fang <fskm...@gmail.com> 于2021年2月2日周二 下午4:44写道: > > > > > >> Hi, Jingsong. > > >> > > >> Thanks for your reply. I think `UNSET` is much better. > > >> > > >> 1. We don't need to introduce another command `UNSET`. `RESET` is > > >> supported in the current sql client now. Our proposal just extends its > > >> grammar and allow users to reset the specified keys. > > >> 2. Hive beeline also uses `RESET` to set the key to the default > > value[1]. > > >> I think it is more friendly for batch users. > > >> > > >> Best, > > >> Shengkai > > >> > > >> [1] > > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients > > >> > > >> Jingsong Li <jingsongl...@gmail.com> 于2021年2月2日周二 下午1:56写道: > > >> > > >>> Thanks for the proposal, yes, sql-client is too outdated. +1 for > > >>> improving it. > > >>> > > >>> About "SET" and "RESET", Why not be "SET" and "UNSET"? > > >>> > > >>> Best, > > >>> Jingsong > > >>> > > >>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <lirui.fu...@gmail.com> wrote: > > >>> > > >>>> Thanks Shengkai for the update! The proposed changes look good to > me. > > >>>> > > >>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <fskm...@gmail.com> > > wrote: > > >>>> > > >>>>> Hi, Rui. > > >>>>> You are right. I have already modified the FLIP. > > >>>>> > > >>>>> The main changes: > > >>>>> > > >>>>> # -f parameter has no restriction about the statement type. > > >>>>> Sometimes, users use the pipe to redirect the result of queries to > > >>>> debug > > >>>>> when submitting job by -f parameter. It's much convenient comparing > > to > > >>>>> writing INSERT INTO statements. > > >>>>> > > >>>>> # Add a new sql client option `sql-client.job.detach` . > > >>>>> Users prefer to execute jobs one by one in the batch mode. Users > can > > >>>> set > > >>>>> this option false and the client will process the next job until > the > > >>>>> current job finishes. The default value of this option is false, > > which > > >>>>> means the client will execute the next job when the current job is > > >>>>> submitted. > > >>>>> > > >>>>> Best, > > >>>>> Shengkai > > >>>>> > > >>>>> > > >>>>> > > >>>>> Rui Li <lirui.fu...@gmail.com> 于2021年1月29日周五 下午4:52写道: > > >>>>> > > >>>>>> Hi Shengkai, > > >>>>>> > > >>>>>> Regarding #2, maybe the -f options in flink and hive have > different > > >>>>>> implications, and we should clarify the behavior. For example, if > > the > > >>>>>> client just submits the job and exits, what happens if the file > > >>>> contains > > >>>>>> two INSERT statements? I don't think we should treat them as a > > >>>> statement > > >>>>>> set, because users should explicitly write BEGIN STATEMENT SET in > > that > > >>>>>> case. And the client shouldn't asynchronously submit the two jobs, > > >>>> because > > >>>>>> the 2nd may depend on the 1st, right? > > >>>>>> > > >>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <fskm...@gmail.com> > > >>>> wrote: > > >>>>>> > > >>>>>>> Hi Rui, > > >>>>>>> Thanks for your feedback. I agree with your suggestions. > > >>>>>>> > > >>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set > > >>>> command. In > > >>>>>>> the implementation, it will just put the key-value into the > > >>>>>>> `Configuration`, which will be used to generate the table config. > > If > > >>>> hive > > >>>>>>> supports to read the setting from the table config, users are > able > > >>>> to set > > >>>>>>> the hive-related settings. > > >>>>>>> > > >>>>>>> For the suggestion 2: The -f parameter will submit the job and > > exit. > > >>>> If > > >>>>>>> the queries never end, users have to cancel the job by > themselves, > > >>>> which is > > >>>>>>> not reliable(people may forget their jobs). In most case, queries > > >>>> are used > > >>>>>>> to analyze the data. Users should use queries in the interactive > > >>>> mode. > > >>>>>>> > > >>>>>>> Best, > > >>>>>>> Shengkai > > >>>>>>> > > >>>>>>> Rui Li <lirui.fu...@gmail.com> 于2021年1月29日周五 下午3:18写道: > > >>>>>>> > > >>>>>>>> Thanks Shengkai for bringing up this discussion. I think it > > covers a > > >>>>>>>> lot of useful features which will dramatically improve the > > >>>> usability of our > > >>>>>>>> SQL Client. I have two questions regarding the FLIP. > > >>>>>>>> > > >>>>>>>> 1. Do you think we can let users set arbitrary configurations > via > > >>>> the > > >>>>>>>> SET command? A connector may have its own configurations and we > > >>>> don't have > > >>>>>>>> a way to dynamically change such configurations in SQL Client. > For > > >>>> example, > > >>>>>>>> users may want to be able to change hive conf when using hive > > >>>> connector [1]. > > >>>>>>>> 2. Any reason why we have to forbid queries in SQL files > specified > > >>>> with > > >>>>>>>> the -f option? Hive supports a similar -f option but allows > > queries > > >>>> in the > > >>>>>>>> file. And a common use case is to run some query and redirect > the > > >>>> results > > >>>>>>>> to a file. So I think maybe flink users would like to do the > same, > > >>>>>>>> especially in batch scenarios. > > >>>>>>>> > > >>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590 > > >>>>>>>> > > >>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu < > > >>>> liuyang0...@gmail.com> > > >>>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> Hi Shengkai, > > >>>>>>>>> > > >>>>>>>>> Glad to see this improvement. And I have some additional > > >>>> suggestions: > > >>>>>>>>> > > >>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to > > >>>>>>>>> StreamTableEnvironment for both streaming and batch sql. > > >>>>>>>>> #2. Improve the way of results retrieval: sql client collect > the > > >>>>>>>>> results > > >>>>>>>>> locally all at once using accumulators at present, > > >>>>>>>>> which may have memory issues in JM or Local for the big > > query > > >>>>>>>>> result. > > >>>>>>>>> Accumulator is only suitable for testing purpose. > > >>>>>>>>> We may change to use SelectTableSink, which is based > > >>>>>>>>> on CollectSinkOperatorCoordinator. > > >>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in > FLIP-91. > > >>>> Seems > > >>>>>>>>> that this FLIP has not moved forward for a long time. > > >>>>>>>>> Provide a long running service out of the box to > > facilitate > > >>>> the > > >>>>>>>>> sql > > >>>>>>>>> submission is necessary. > > >>>>>>>>> > > >>>>>>>>> What do you think of these? > > >>>>>>>>> > > >>>>>>>>> [1] > > >>>>>>>>> > > >>>>>>>>> > > >>>> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Shengkai Fang <fskm...@gmail.com> 于2021年1月28日周四 下午8:54写道: > > >>>>>>>>> > > >>>>>>>>>> Hi devs, > > >>>>>>>>>> > > >>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL > Client > > >>>>>>>>>> Improvements. > > >>>>>>>>>> > > >>>>>>>>>> Many users have complained about the problems of the sql > client. > > >>>> For > > >>>>>>>>>> example, users can not register the table proposed by FLIP-95. > > >>>>>>>>>> > > >>>>>>>>>> The main changes in this FLIP: > > >>>>>>>>>> > > >>>>>>>>>> - use -i parameter to specify the sql file to initialize the > > >>>> table > > >>>>>>>>>> environment and deprecated YAML file; > > >>>>>>>>>> - add -f to submit sql file and deprecated '-u' parameter; > > >>>>>>>>>> - add more interactive commands, e.g ADD JAR; > > >>>>>>>>>> - support statement set syntax; > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> For more detailed changes, please refer to FLIP-163[1]. > > >>>>>>>>>> > > >>>>>>>>>> Look forward to your feedback. > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> Best, > > >>>>>>>>>> Shengkai > > >>>>>>>>>> > > >>>>>>>>>> [1] > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> -- > > >>>>>>>>> > > >>>>>>>>> *With kind regards > > >>>>>>>>> ------------------------------------------------------------ > > >>>>>>>>> Sebastian Liu 刘洋 > > >>>>>>>>> Institute of Computing Technology, Chinese Academy of Science > > >>>>>>>>> Mobile\WeChat: +86—15201613655 > > >>>>>>>>> E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com> > > >>>>>>>>> QQ: 3239559* > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> -- > > >>>>>>>> Best regards! > > >>>>>>>> Rui Li > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>>> -- > > >>>>>> Best regards! > > >>>>>> Rui Li > > >>>>>> > > >>>>> > > >>>> > > >>>> -- > > >>>> Best regards! > > >>>> Rui Li > > >>>> > > >>> > > >>> > > >>> -- > > >>> Best, Jingsong Lee > > >>> > > >> > > > > > > > >