Thanks for this great proposal Shengkai. This will give the SQL Client a
very good update and make it production ready.
Here is some feedback from my side:
1) SQL client specific options
I don't think that `sql-client.planner` and `sql-client.execution.mode`
are SQL Client specific. Similar to `StreamExecutionEnvironment` and
`ExecutionConfig#configure` that have been added recently, we should
offer a possibility for TableEnvironment. How about we offer
`TableEnvironment.create(ReadableConfig)` and add a `table.planner` and
`table.execution-mode` to
`org.apache.flink.table.api.config.TableConfigOptions`?
2) Execution file
Did you have a look at the Appendix of FLIP-84 [1] including the mailing
list thread at that time? Could you further elaborate how the
multi-statement execution should work for a unified batch/streaming
story? According to our past discussions, each line in an execution file
should be executed blocking which means a streaming query needs a
statement set to execute multiple INSERT INTO statement, correct? We
should also offer this functionality in
`TableEnvironment.executeMultiSql()`. Whether `sql-client.job.detach` is
SQL Client specific needs to be determined, it could also be a general
`table.multi-sql-async` option?
3) DELETE JAR
Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like one is
actively deleting the JAR in the corresponding path.
4) LIST JAR
This should be `SHOW JARS` according to other SQL commands such as `SHOW
CATALOGS`, `SHOW TABLES`, etc. [2].
5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
We should keep the details in sync with
`org.apache.flink.table.api.ExplainDetail` and avoid confusion about
differently named ExplainDetails. I would vote for `ESTIMATED_COST`
instead of `COST`. I'm sure the original author had a reason why to call
it that way.
6) Implementation details
It would be nice to understand how we plan to implement the given
features. Most of the commands and config options should go into
TableEnvironment and SqlParser directly, correct? This way users have a
unified way of using Flink SQL. TableEnvironment would provide a similar
user experience in notebooks or interactive programs than the SQL Client.
[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
[2]
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
Regards,
Timo
On 02.02.21 10:13, Shengkai Fang wrote:
Sorry for the typo. I mean `RESET` is much better rather than `UNSET`.
Shengkai Fang <fskm...@gmail.com> 于2021年2月2日周二 下午4:44写道:
Hi, Jingsong.
Thanks for your reply. I think `UNSET` is much better.
1. We don't need to introduce another command `UNSET`. `RESET` is
supported in the current sql client now. Our proposal just extends its
grammar and allow users to reset the specified keys.
2. Hive beeline also uses `RESET` to set the key to the default value[1].
I think it is more friendly for batch users.
Best,
Shengkai
[1] https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
Jingsong Li <jingsongl...@gmail.com> 于2021年2月2日周二 下午1:56写道:
Thanks for the proposal, yes, sql-client is too outdated. +1 for
improving it.
About "SET" and "RESET", Why not be "SET" and "UNSET"?
Best,
Jingsong
On Mon, Feb 1, 2021 at 2:46 PM Rui Li <lirui.fu...@gmail.com> wrote:
Thanks Shengkai for the update! The proposed changes look good to me.
On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <fskm...@gmail.com> wrote:
Hi, Rui.
You are right. I have already modified the FLIP.
The main changes:
# -f parameter has no restriction about the statement type.
Sometimes, users use the pipe to redirect the result of queries to
debug
when submitting job by -f parameter. It's much convenient comparing to
writing INSERT INTO statements.
# Add a new sql client option `sql-client.job.detach` .
Users prefer to execute jobs one by one in the batch mode. Users can
set
this option false and the client will process the next job until the
current job finishes. The default value of this option is false, which
means the client will execute the next job when the current job is
submitted.
Best,
Shengkai
Rui Li <lirui.fu...@gmail.com> 于2021年1月29日周五 下午4:52写道:
Hi Shengkai,
Regarding #2, maybe the -f options in flink and hive have different
implications, and we should clarify the behavior. For example, if the
client just submits the job and exits, what happens if the file
contains
two INSERT statements? I don't think we should treat them as a
statement
set, because users should explicitly write BEGIN STATEMENT SET in that
case. And the client shouldn't asynchronously submit the two jobs,
because
the 2nd may depend on the 1st, right?
On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <fskm...@gmail.com>
wrote:
Hi Rui,
Thanks for your feedback. I agree with your suggestions.
For the suggestion 1: Yes. we are plan to strengthen the set
command. In
the implementation, it will just put the key-value into the
`Configuration`, which will be used to generate the table config. If
hive
supports to read the setting from the table config, users are able
to set
the hive-related settings.
For the suggestion 2: The -f parameter will submit the job and exit.
If
the queries never end, users have to cancel the job by themselves,
which is
not reliable(people may forget their jobs). In most case, queries
are used
to analyze the data. Users should use queries in the interactive
mode.
Best,
Shengkai
Rui Li <lirui.fu...@gmail.com> 于2021年1月29日周五 下午3:18写道:
Thanks Shengkai for bringing up this discussion. I think it covers a
lot of useful features which will dramatically improve the
usability of our
SQL Client. I have two questions regarding the FLIP.
1. Do you think we can let users set arbitrary configurations via
the
SET command? A connector may have its own configurations and we
don't have
a way to dynamically change such configurations in SQL Client. For
example,
users may want to be able to change hive conf when using hive
connector [1].
2. Any reason why we have to forbid queries in SQL files specified
with
the -f option? Hive supports a similar -f option but allows queries
in the
file. And a common use case is to run some query and redirect the
results
to a file. So I think maybe flink users would like to do the same,
especially in batch scenarios.
[1] https://issues.apache.org/jira/browse/FLINK-20590
On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
liuyang0...@gmail.com>
wrote:
Hi Shengkai,
Glad to see this improvement. And I have some additional
suggestions:
#1. Unify the TableEnvironment in ExecutionContext to
StreamTableEnvironment for both streaming and batch sql.
#2. Improve the way of results retrieval: sql client collect the
results
locally all at once using accumulators at present,
which may have memory issues in JM or Local for the big query
result.
Accumulator is only suitable for testing purpose.
We may change to use SelectTableSink, which is based
on CollectSinkOperatorCoordinator.
#3. Do we need to consider Flink SQL gateway which is in FLIP-91.
Seems
that this FLIP has not moved forward for a long time.
Provide a long running service out of the box to facilitate
the
sql
submission is necessary.
What do you think of these?
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
Shengkai Fang <fskm...@gmail.com> 于2021年1月28日周四 下午8:54写道:
Hi devs,
Jark and I want to start a discussion about FLIP-163:SQL Client
Improvements.
Many users have complained about the problems of the sql client.
For
example, users can not register the table proposed by FLIP-95.
The main changes in this FLIP:
- use -i parameter to specify the sql file to initialize the
table
environment and deprecated YAML file;
- add -f to submit sql file and deprecated '-u' parameter;
- add more interactive commands, e.g ADD JAR;
- support statement set syntax;
For more detailed changes, please refer to FLIP-163[1].
Look forward to your feedback.
Best,
Shengkai
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
--
*With kind regards
------------------------------------------------------------
Sebastian Liu 刘洋
Institute of Computing Technology, Chinese Academy of Science
Mobile\WeChat: +86—15201613655
E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com>
QQ: 3239559*
--
Best regards!
Rui Li
--
Best regards!
Rui Li
--
Best regards!
Rui Li
--
Best, Jingsong Lee