Re: [DISCUSS]FLIP-163: SQL Client Improvements

Jark Wu Tue, 02 Feb 2021 23:58:25 -0800

Hi Timo,

I will respond some of the questions:


1) SQL client specific options

Whether it starts with "table" or "sql-client" depends on where the
configuration takes effect.
If it is a table configuration, we should make clear what's the behavior
when users change
the configuration in the lifecycle of TableEnvironment.

I agree with Shengkai `sql-client.planner` and `sql-client.execution.mode`
are something special
that can't be changed after TableEnvironment has been initialized. You can
see
`StreamExecutionEnvironment` provides `configure()`  method to override
configuration after
StreamExecutionEnvironment has been initialized.

Therefore, I think it would be better to still use  `sql-client.planner`
and `sql-client.execution.mode`.

2) Execution file

>From my point of view, there is a big difference between
`sql-client.job.detach` and
`TableEnvironment.executeMultiSql()` that `sql-client.job.detach` will
affect every single DML statement
in the terminal, not only the statements in SQL files. I think the single
DML statement in the interactive
terminal is something like tEnv#executeSql() instead of
tEnv#executeMultiSql.
So I don't like the "multi" and "sql" keyword in `table.multi-sql-async`.
I just find that runtime provides a configuration called
"execution.attached" [1] which is false by default
which specifies if the pipeline is submitted in attached or detached mode.
It provides exactly the same
functionality of `sql-client.job.detach`. What do you think about using
this option?

If we also want to support this config in TableEnvironment, I think it
should also affect the DML execution
 of `tEnv#executeSql()`, not only DMLs in `tEnv#executeMultiSql()`.
Therefore, the behavior may look like this:

val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async by default
tableResult.await()   ==> manually block until finish
tEnv.getConfig().getConfiguration().setString("execution.attached", "true")
val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync, don't need
to wait on the TableResult
tEnv.executeMultiSql(
"""
CREATE TABLE ....  ==> always sync
INSERT INTO ...  => sync, because we set configuration above
SET execution.attached = false;
INSERT INTO ...  => async
""")

On the other hand, I think `sql-client.job.detach`
and `TableEnvironment.executeMultiSql()` should be two separate topics,
as Shengkai mentioned above, SQL CLI only depends on
`TableEnvironment#executeSql()` to support multi-line statements.
I'm fine with making `executeMultiSql()` clear but don't want it to block
this FLIP, maybe we can discuss this in another thread.


Best,
Jark

[1]:
https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached

On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fskm...@gmail.com> wrote:

> Hi, Timo.
> Thanks for your detailed feedback. I have some thoughts about your
> feedback.
>
> *Regarding #1*: I think the main problem is whether the table environment
> has the ability to update itself. Let's take a simple program as an
> example.
>
>
> ```
> TableEnvironment tEnv = TableEnvironment.create(...);
>
> tEnv.getConfig.getConfiguration.setString("table.planner", "old");
>
>
> tEnv.executeSql("...");
>
> ```
>
> If we regard this option as a table option, users don't have to create
> another table environment manually. In that case, tEnv needs to check
> whether the current mode and planner are the same as before when executeSql
> or explainSql. I don't think it's easy work for the table environment,
> especially if users have a StreamExecutionEnvironment but set old planner
> and batch mode. But when we make this option as a sql client option, users
> only use the SET command to change the setting. We can rebuild a new table
> environment when set successes.
>
>
> *Regarding #2*: I think we need to discuss the implementation before
> continuing this topic. In the sql client, we will maintain two parsers. The
> first parser(client parser) will only match the sql client commands. If the
> client parser can't parse the statement, we will leverage the power of the
> table environment to execute. According to our blueprint,
> TableEnvironment#executeSql is enough for the sql client. Therefore,
> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
>
> But if we need to introduce the `TableEnvironment.executeMultiSql` in the
> future, I think it's OK to use the option `table.multi-sql-async` rather
> than option `sql-client.job.detach`. But we think the name is not suitable
> because the name is confusing for others. When setting the option false, we
> just mean it will block the execution of the INSERT INTO statement, not DDL
> or others(other sql statements are always executed synchronously). So how
> about `table.job.async`? It only works for the sql-client and the
> executeMultiSql. If we set this value false, the table environment will
> return the result until the job finishes.
>
>
> *Regarding #3, #4*: I still think we should use DELETE JAR and LIST JAR
> because HIVE also uses these commands to add the jar into the classpath or
> delete the jar. If we use  such commands, it can reduce our work for hive
> compatibility.
>
> For SHOW JAR, I think the main concern is the jars are not maintained by
> the Catalog. If we really needs to keep consistent with SQL grammar, maybe
> we should use
>
> `ADD JAR` -> `CREATE JAR`,
> `DELETE JAR` -> `DROP JAR`,
> `LIST JAR` -> `SHOW JAR`.
>
> *Regarding #5*: I agree with you that we'd better keep consistent.
>
> *Regarding #6*: Yes. Most of the commands should belong to the table
> environment. In the Summary section, I use the <NOTE> tag to identify which
> commands should belong to the sql client and which commands should belong
> to the table environment. I also add a new section about implementation
> details in the FLIP.
>
> Best,
> Shengkai
>
> Timo Walther <twal...@apache.org> 于2021年2月2日周二 下午6:43写道：
>
> > Thanks for this great proposal Shengkai. This will give the SQL Client a
> > very good update and make it production ready.
> >
> > Here is some feedback from my side:
> >
> > 1) SQL client specific options
> >
> > I don't think that `sql-client.planner` and `sql-client.execution.mode`
> > are SQL Client specific. Similar to `StreamExecutionEnvironment` and
> > `ExecutionConfig#configure` that have been added recently, we should
> > offer a possibility for TableEnvironment. How about we offer
> > `TableEnvironment.create(ReadableConfig)` and add a `table.planner` and
> > `table.execution-mode` to
> > `org.apache.flink.table.api.config.TableConfigOptions`?
> >
> > 2) Execution file
> >
> > Did you have a look at the Appendix of FLIP-84 [1] including the mailing
> > list thread at that time? Could you further elaborate how the
> > multi-statement execution should work for a unified batch/streaming
> > story? According to our past discussions, each line in an execution file
> > should be executed blocking which means a streaming query needs a
> > statement set to execute multiple INSERT INTO statement, correct? We
> > should also offer this functionality in
> > `TableEnvironment.executeMultiSql()`. Whether `sql-client.job.detach` is
> > SQL Client specific needs to be determined, it could also be a general
> > `table.multi-sql-async` option?
> >
> > 3) DELETE JAR
> >
> > Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like one is
> > actively deleting the JAR in the corresponding path.
> >
> > 4) LIST JAR
> >
> > This should be `SHOW JARS` according to other SQL commands such as `SHOW
> > CATALOGS`, `SHOW TABLES`, etc. [2].
> >
> > 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> >
> > We should keep the details in sync with
> > `org.apache.flink.table.api.ExplainDetail` and avoid confusion about
> > differently named ExplainDetails. I would vote for `ESTIMATED_COST`
> > instead of `COST`. I'm sure the original author had a reason why to call
> > it that way.
> >
> > 6) Implementation details
> >
> > It would be nice to understand how we plan to implement the given
> > features. Most of the commands and config options should go into
> > TableEnvironment and SqlParser directly, correct? This way users have a
> > unified way of using Flink SQL. TableEnvironment would provide a similar
> > user experience in notebooks or interactive programs than the SQL Client.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > [2]
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> >
> > Regards,
> > Timo
> >
> >
> > On 02.02.21 10:13, Shengkai Fang wrote:
> > > Sorry for the typo. I mean `RESET` is much better rather than `UNSET`.
> > >
> > > Shengkai Fang <fskm...@gmail.com> 于2021年2月2日周二 下午4:44写道：
> > >
> > >> Hi, Jingsong.
> > >>
> > >> Thanks for your reply. I think `UNSET` is much better.
> > >>
> > >> 1. We don't need to introduce another command `UNSET`. `RESET` is
> > >> supported in the current sql client now. Our proposal just extends its
> > >> grammar and allow users to reset the specified keys.
> > >> 2. Hive beeline also uses `RESET` to set the key to the default
> > value[1].
> > >> I think it is more friendly for batch users.
> > >>
> > >> Best,
> > >> Shengkai
> > >>
> > >> [1]
> > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> > >>
> > >> Jingsong Li <jingsongl...@gmail.com> 于2021年2月2日周二 下午1:56写道：
> > >>
> > >>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
> > >>> improving it.
> > >>>
> > >>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> > >>>
> > >>> Best,
> > >>> Jingsong
> > >>>
> > >>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <lirui.fu...@gmail.com> wrote:
> > >>>
> > >>>> Thanks Shengkai for the update! The proposed changes look good to
> me.
> > >>>>
> > >>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <fskm...@gmail.com>
> > wrote:
> > >>>>
> > >>>>> Hi, Rui.
> > >>>>> You are right. I have already modified the FLIP.
> > >>>>>
> > >>>>> The main changes:
> > >>>>>
> > >>>>> # -f parameter has no restriction about the statement type.
> > >>>>> Sometimes, users use the pipe to redirect the result of queries to
> > >>>> debug
> > >>>>> when submitting job by -f parameter. It's much convenient comparing
> > to
> > >>>>> writing INSERT INTO statements.
> > >>>>>
> > >>>>> # Add a new sql client option `sql-client.job.detach` .
> > >>>>> Users prefer to execute jobs one by one in the batch mode. Users
> can
> > >>>> set
> > >>>>> this option false and the client will process the next job until
> the
> > >>>>> current job finishes. The default value of this option is false,
> > which
> > >>>>> means the client will execute the next job when the current job is
> > >>>>> submitted.
> > >>>>>
> > >>>>> Best,
> > >>>>> Shengkai
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Rui Li <lirui.fu...@gmail.com> 于2021年1月29日周五 下午4:52写道：
> > >>>>>
> > >>>>>> Hi Shengkai,
> > >>>>>>
> > >>>>>> Regarding #2, maybe the -f options in flink and hive have
> different
> > >>>>>> implications, and we should clarify the behavior. For example, if
> > the
> > >>>>>> client just submits the job and exits, what happens if the file
> > >>>> contains
> > >>>>>> two INSERT statements? I don't think we should treat them as a
> > >>>> statement
> > >>>>>> set, because users should explicitly write BEGIN STATEMENT SET in
> > that
> > >>>>>> case. And the client shouldn't asynchronously submit the two jobs,
> > >>>> because
> > >>>>>> the 2nd may depend on the 1st, right?
> > >>>>>>
> > >>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <fskm...@gmail.com>
> > >>>> wrote:
> > >>>>>>
> > >>>>>>> Hi Rui,
> > >>>>>>> Thanks for your feedback. I agree with your suggestions.
> > >>>>>>>
> > >>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set
> > >>>> command. In
> > >>>>>>> the implementation, it will just put the key-value into the
> > >>>>>>> `Configuration`, which will be used to generate the table config.
> > If
> > >>>> hive
> > >>>>>>> supports to read the setting from the table config, users are
> able
> > >>>> to set
> > >>>>>>> the hive-related settings.
> > >>>>>>>
> > >>>>>>> For the suggestion 2: The -f parameter will submit the job and
> > exit.
> > >>>> If
> > >>>>>>> the queries never end, users have to cancel the job by
> themselves,
> > >>>> which is
> > >>>>>>> not reliable(people may forget their jobs). In most case, queries
> > >>>> are used
> > >>>>>>> to analyze the data. Users should use queries in the interactive
> > >>>> mode.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Shengkai
> > >>>>>>>
> > >>>>>>> Rui Li <lirui.fu...@gmail.com> 于2021年1月29日周五 下午3:18写道：
> > >>>>>>>
> > >>>>>>>> Thanks Shengkai for bringing up this discussion. I think it
> > covers a
> > >>>>>>>> lot of useful features which will dramatically improve the
> > >>>> usability of our
> > >>>>>>>> SQL Client. I have two questions regarding the FLIP.
> > >>>>>>>>
> > >>>>>>>> 1. Do you think we can let users set arbitrary configurations
> via
> > >>>> the
> > >>>>>>>> SET command? A connector may have its own configurations and we
> > >>>> don't have
> > >>>>>>>> a way to dynamically change such configurations in SQL Client.
> For
> > >>>> example,
> > >>>>>>>> users may want to be able to change hive conf when using hive
> > >>>> connector [1].
> > >>>>>>>> 2. Any reason why we have to forbid queries in SQL files
> specified
> > >>>> with
> > >>>>>>>> the -f option? Hive supports a similar -f option but allows
> > queries
> > >>>> in the
> > >>>>>>>> file. And a common use case is to run some query and redirect
> the
> > >>>> results
> > >>>>>>>> to a file. So I think maybe flink users would like to do the
> same,
> > >>>>>>>> especially in batch scenarios.
> > >>>>>>>>
> > >>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> > >>>>>>>>
> > >>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> > >>>> liuyang0...@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi Shengkai,
> > >>>>>>>>>
> > >>>>>>>>> Glad to see this improvement. And I have some additional
> > >>>> suggestions:
> > >>>>>>>>>
> > >>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> > >>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
> > >>>>>>>>> #2. Improve the way of results retrieval: sql client collect
> the
> > >>>>>>>>> results
> > >>>>>>>>> locally all at once using accumulators at present,
> > >>>>>>>>>        which may have memory issues in JM or Local for the big
> > query
> > >>>>>>>>> result.
> > >>>>>>>>> Accumulator is only suitable for testing purpose.
> > >>>>>>>>>        We may change to use SelectTableSink, which is based
> > >>>>>>>>> on CollectSinkOperatorCoordinator.
> > >>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in
> FLIP-91.
> > >>>> Seems
> > >>>>>>>>> that this FLIP has not moved forward for a long time.
> > >>>>>>>>>        Provide a long running service out of the box to
> > facilitate
> > >>>> the
> > >>>>>>>>> sql
> > >>>>>>>>> submission is necessary.
> > >>>>>>>>>
> > >>>>>>>>> What do you think of these?
> > >>>>>>>>>
> > >>>>>>>>> [1]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Shengkai Fang <fskm...@gmail.com> 于2021年1月28日周四 下午8:54写道：
> > >>>>>>>>>
> > >>>>>>>>>> Hi devs,
> > >>>>>>>>>>
> > >>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL
> Client
> > >>>>>>>>>> Improvements.
> > >>>>>>>>>>
> > >>>>>>>>>> Many users have complained about the problems of the sql
> client.
> > >>>> For
> > >>>>>>>>>> example, users can not register the table proposed by FLIP-95.
> > >>>>>>>>>>
> > >>>>>>>>>> The main changes in this FLIP:
> > >>>>>>>>>>
> > >>>>>>>>>> - use -i parameter to specify the sql file to initialize the
> > >>>> table
> > >>>>>>>>>> environment and deprecated YAML file;
> > >>>>>>>>>> - add -f to submit sql file and deprecated '-u' parameter;
> > >>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> > >>>>>>>>>> - support statement set syntax;
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
> > >>>>>>>>>>
> > >>>>>>>>>> Look forward to your feedback.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Shengkai
> > >>>>>>>>>>
> > >>>>>>>>>> [1]
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>>
> > >>>>>>>>> *With kind regards
> > >>>>>>>>> ------------------------------------------------------------
> > >>>>>>>>> Sebastian Liu 刘洋
> > >>>>>>>>> Institute of Computing Technology, Chinese Academy of Science
> > >>>>>>>>> Mobile\WeChat: +86—15201613655
> > >>>>>>>>> E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com>
> > >>>>>>>>> QQ: 3239559*
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>>> Best regards!
> > >>>>>>>> Rui Li
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Best regards!
> > >>>>>> Rui Li
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>> --
> > >>>> Best regards!
> > >>>> Rui Li
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Best, Jingsong Lee
> > >>>
> > >>
> > >
> >
> >
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Reply via email to