Re: [DISCUSS]FLIP-163: SQL Client Improvements

Ingo Bürk Thu, 04 Feb 2021 01:08:34 -0800

Hi,

regarding the (un-)quoted question, compatibility is of course an important
argument, but in terms of consistency I'd find it a bit surprising that
WITH handles it differently than SET, and I wonder if that could cause
friction for developers when writing their SQL.



Regards
Ingo

On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <[email protected]> wrote:

> Hi all,
>
> Regarding "One Parser", I think it's not possible for now because Calcite
> parser can't parse
> special characters (e.g. "-") unless quoting them as string literals.
> That's why the WITH option
> key are string literals not identifiers.
>
> SET table.exec.mini-batch.enabled = true and ADD JAR
> /local/my-home/test.jar
> have the same
> problems. That's why we propose two parser, one splits lines into multiple
> statements and match special
> command through regex which is light-weight, and delegate other statements
> to the other parser which is Calcite parser.
>
> Note: we should stick on the unquoted SET table.exec.mini-batch.enabled =
> true syntax,
> both for backward-compatibility and easy-to-use, and all the other systems
> don't have quotes on the key.
>
>
> Regarding "table.planner" vs "sql-client.planner",
> if we want to use "table.planner", I think we should explain clearly what's
> the scope it can be used in documentation.
> Otherwise, there will be users complaining why the planner doesn't change
> when setting the configuration on TableEnv.
> Would be better throwing an exception to indicate users it's now allowed to
> change planner after TableEnv is initialized.
> However, it seems not easy to implement.
>
> Best,
> Jark
>
> On Thu, 4 Feb 2021 at 15:49, godfrey he <[email protected]> wrote:
>
> > Hi everyone,
> >
> > Regarding "table.planner" and "table.execution-mode"
> > If we define that those two options are just used to initialize the
> > TableEnvironment, +1 for introducing table options instead of sql-client
> > options.
> >
> > Regarding "the sql client, we will maintain two parsers", I want to give
> > more inputs:
> > We want to introduce sql-gateway into the Flink project (see FLIP-24 &
> > FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI client and
> > the gateway service will communicate through Rest API. The " ADD JAR
> > /local/path/jar " will be executed in the CLI client machine. So when we
> > submit a sql file which contains multiple statements, the CLI client
> needs
> > to pick out the "ADD JAR" line, and also statements need to be submitted
> or
> > executed one by one to make sure the result is correct. The sql file may
> be
> > look like:
> >
> > SET xxx=yyy;
> > create table my_table ...;
> > create table my_sink ...;
> > ADD JAR /local/path/jar1;
> > create function my_udf as com....MyUdf;
> > insert into my_sink select ..., my_udf(xx) from ...;
> > REMOVE JAR /local/path/jar1;
> > drop function my_udf;
> > ADD JAR /local/path/jar2;
> > create function my_udf as com....MyUdf2;
> > insert into my_sink select ..., my_udf(xx) from ...;
> >
> > The lines need to be splitted into multiple statements first in the CLI
> > client, there are two approaches:
> > 1. The CLI client depends on the sql-parser: the sql-parser splits the
> > lines and tells which lines are "ADD JAR".
> > pro: there is only one parser
> > cons: It's a little heavy that the CLI client depends on the sql-parser,
> > because the CLI client is just a simple tool which receives the user
> > commands and displays the result. The non "ADD JAR" command will be
> parsed
> > twice.
> >
> > 2. The CLI client splits the lines into multiple statements and finds the
> > ADD JAR command through regex matching.
> > pro: The CLI client is very light-weight.
> > cons: there are two parsers.
> >
> > (personally, I prefer the second option)
> >
> > Regarding "SHOW or LIST JARS", I think we can support them both.
> > For default dialect, we support SHOW JARS, but if we switch to hive
> > dialect, LIST JARS is also supported.
> >
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >
> > Best,
> > Godfrey
> >
> > Rui Li <[email protected]> 于2021年2月4日周四 上午10:40写道：
> >
> > > Hi guys,
> > >
> > > Regarding #3 and #4, I agree SHOW JARS is more consistent with other
> > > commands than LIST JARS. I don't have a strong opinion about REMOVE vs
> > > DELETE though.
> > >
> > > While flink doesn't need to follow hive syntax, as far as I know, most
> > > users who are requesting these features are previously hive users. So I
> > > wonder whether we can support both LIST/SHOW JARS and REMOVE/DELETE
> JARS
> > > as synonyms? It's just like lots of systems accept both EXIT and QUIT
> as
> > > the command to terminate the program. So if that's not hard to achieve,
> > and
> > > will make users happier, I don't see a reason why we must choose one
> over
> > > the other.
> > >
> > > On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <[email protected]>
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > some feedback regarding the open questions. Maybe we can discuss the
> > > > `TableEnvironment.executeMultiSql` story offline to determine how we
> > > > proceed with this in the near future.
> > > >
> > > > 1) "whether the table environment has the ability to update itself"
> > > >
> > > > Maybe there was some misunderstanding. I don't think that we should
> > > > support `tEnv.getConfig.getConfiguration.setString("table.planner",
> > > > "old")`. Instead I'm proposing to support
> > > > `TableEnvironment.create(Configuration)` where planner and execution
> > > > mode are read immediately and a subsequent changes to these options
> > will
> > > > have no effect. We are doing it similar in `new
> > > > StreamExecutionEnvironment(Configuration)`. These two ConfigOption's
> > > > must not be SQL Client specific but can be part of the core table
> code
> > > > base. Many users would like to get a 100% preconfigured environment
> > from
> > > > just Configuration. And this is not possible right now. We can solve
> > > > both use cases in one change.
> > > >
> > > > 2) "the sql client, we will maintain two parsers"
> > > >
> > > > I remember we had some discussion about this and decided that we
> would
> > > > like to maintain only one parser. In the end it is "One Flink SQL"
> > where
> > > > commands influence each other also with respect to keywords. It
> should
> > > > be fine to include the SQL Client commands in the Flink parser. Of
> > > > cource the table environment would not be able to handle the
> > `Operation`
> > > > instance that would be the result but we can introduce hooks to
> handle
> > > > those `Operation`s. Or we introduce parser extensions.
> > > >
> > > > Can we skip `table.job.async` in the first version? We should further
> > > > discuss whether we introduce a special SQL clause for wrapping async
> > > > behavior or if we use a config option? Esp. for streaming queries we
> > > > need to be careful and should force users to either "one INSERT INTO"
> > or
> > > > "one STATEMENT SET".
> > > >
> > > > 3) 4) "HIVE also uses these commands"
> > > >
> > > > In general, Hive is not a good reference. Aligning the commands more
> > > > with the remaining commands should be our goal. We just had a MODULE
> > > > discussion where we selected SHOW instead of LIST. But it is true
> that
> > > > JARs are not part of the catalog which is why I would not use
> > > > CREATE/DROP. ADD/REMOVE are commonly siblings in the English
> language.
> > > > Take a look at the Java collection API as another example.
> > > >
> > > > 6) "Most of the commands should belong to the table environment"
> > > >
> > > > Thanks for updating the FLIP this makes things easier to understand.
> It
> > > > is good to see that most commends will be available in
> > TableEnvironment.
> > > > However, I would also support SET and RESET for consistency. Again,
> > from
> > > > an architectural point of view, if we would allow some kind of
> > > > `Operation` hook in table environment, we could check for SQL Client
> > > > specific options and forward to regular
> `TableConfig.getConfiguration`
> > > > otherwise. What do you think?
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > >
> > > > On 03.02.21 08:58, Jark Wu wrote:
> > > > > Hi Timo,
> > > > >
> > > > > I will respond some of the questions:
> > > > >
> > > > > 1) SQL client specific options
> > > > >
> > > > > Whether it starts with "table" or "sql-client" depends on where the
> > > > > configuration takes effect.
> > > > > If it is a table configuration, we should make clear what's the
> > > behavior
> > > > > when users change
> > > > > the configuration in the lifecycle of TableEnvironment.
> > > > >
> > > > > I agree with Shengkai `sql-client.planner` and
> > > > `sql-client.execution.mode`
> > > > > are something special
> > > > > that can't be changed after TableEnvironment has been initialized.
> > You
> > > > can
> > > > > see
> > > > > `StreamExecutionEnvironment` provides `configure()`  method to
> > override
> > > > > configuration after
> > > > > StreamExecutionEnvironment has been initialized.
> > > > >
> > > > > Therefore, I think it would be better to still use
> > > `sql-client.planner`
> > > > > and `sql-client.execution.mode`.
> > > > >
> > > > > 2) Execution file
> > > > >
> > > > >>From my point of view, there is a big difference between
> > > > > `sql-client.job.detach` and
> > > > > `TableEnvironment.executeMultiSql()` that `sql-client.job.detach`
> > will
> > > > > affect every single DML statement
> > > > > in the terminal, not only the statements in SQL files. I think the
> > > single
> > > > > DML statement in the interactive
> > > > > terminal is something like tEnv#executeSql() instead of
> > > > > tEnv#executeMultiSql.
> > > > > So I don't like the "multi" and "sql" keyword in
> > > `table.multi-sql-async`.
> > > > > I just find that runtime provides a configuration called
> > > > > "execution.attached" [1] which is false by default
> > > > > which specifies if the pipeline is submitted in attached or
> detached
> > > > mode.
> > > > > It provides exactly the same
> > > > > functionality of `sql-client.job.detach`. What do you think about
> > using
> > > > > this option?
> > > > >
> > > > > If we also want to support this config in TableEnvironment, I think
> > it
> > > > > should also affect the DML execution
> > > > >   of `tEnv#executeSql()`, not only DMLs in
> `tEnv#executeMultiSql()`.
> > > > > Therefore, the behavior may look like this:
> > > > >
> > > > > val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async by
> > > > default
> > > > > tableResult.await()   ==> manually block until finish
> > > > > tEnv.getConfig().getConfiguration().setString("execution.attached",
> > > > "true")
> > > > > val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync,
> > don't
> > > > need
> > > > > to wait on the TableResult
> > > > > tEnv.executeMultiSql(
> > > > > """
> > > > > CREATE TABLE ....  ==> always sync
> > > > > INSERT INTO ...  => sync, because we set configuration above
> > > > > SET execution.attached = false;
> > > > > INSERT INTO ...  => async
> > > > > """)
> > > > >
> > > > > On the other hand, I think `sql-client.job.detach`
> > > > > and `TableEnvironment.executeMultiSql()` should be two separate
> > topics,
> > > > > as Shengkai mentioned above, SQL CLI only depends on
> > > > > `TableEnvironment#executeSql()` to support multi-line statements.
> > > > > I'm fine with making `executeMultiSql()` clear but don't want it to
> > > block
> > > > > this FLIP, maybe we can discuss this in another thread.
> > > > >
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > > [1]:
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> > > > >
> > > > > On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <[email protected]>
> > wrote:
> > > > >
> > > > >> Hi, Timo.
> > > > >> Thanks for your detailed feedback. I have some thoughts about your
> > > > >> feedback.
> > > > >>
> > > > >> *Regarding #1*: I think the main problem is whether the table
> > > > environment
> > > > >> has the ability to update itself. Let's take a simple program as
> an
> > > > >> example.
> > > > >>
> > > > >>
> > > > >> ```
> > > > >> TableEnvironment tEnv = TableEnvironment.create(...);
> > > > >>
> > > > >> tEnv.getConfig.getConfiguration.setString("table.planner", "old");
> > > > >>
> > > > >>
> > > > >> tEnv.executeSql("...");
> > > > >>
> > > > >> ```
> > > > >>
> > > > >> If we regard this option as a table option, users don't have to
> > create
> > > > >> another table environment manually. In that case, tEnv needs to
> > check
> > > > >> whether the current mode and planner are the same as before when
> > > > executeSql
> > > > >> or explainSql. I don't think it's easy work for the table
> > environment,
> > > > >> especially if users have a StreamExecutionEnvironment but set old
> > > > planner
> > > > >> and batch mode. But when we make this option as a sql client
> option,
> > > > users
> > > > >> only use the SET command to change the setting. We can rebuild a
> new
> > > > table
> > > > >> environment when set successes.
> > > > >>
> > > > >>
> > > > >> *Regarding #2*: I think we need to discuss the implementation
> before
> > > > >> continuing this topic. In the sql client, we will maintain two
> > > parsers.
> > > > The
> > > > >> first parser(client parser) will only match the sql client
> commands.
> > > If
> > > > the
> > > > >> client parser can't parse the statement, we will leverage the
> power
> > of
> > > > the
> > > > >> table environment to execute. According to our blueprint,
> > > > >> TableEnvironment#executeSql is enough for the sql client.
> Therefore,
> > > > >> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
> > > > >>
> > > > >> But if we need to introduce the `TableEnvironment.executeMultiSql`
> > in
> > > > the
> > > > >> future, I think it's OK to use the option `table.multi-sql-async`
> > > rather
> > > > >> than option `sql-client.job.detach`. But we think the name is not
> > > > suitable
> > > > >> because the name is confusing for others. When setting the option
> > > > false, we
> > > > >> just mean it will block the execution of the INSERT INTO
> statement,
> > > not
> > > > DDL
> > > > >> or others(other sql statements are always executed synchronously).
> > So
> > > > how
> > > > >> about `table.job.async`? It only works for the sql-client and the
> > > > >> executeMultiSql. If we set this value false, the table environment
> > > will
> > > > >> return the result until the job finishes.
> > > > >>
> > > > >>
> > > > >> *Regarding #3, #4*: I still think we should use DELETE JAR and
> LIST
> > > JAR
> > > > >> because HIVE also uses these commands to add the jar into the
> > > classpath
> > > > or
> > > > >> delete the jar. If we use  such commands, it can reduce our work
> for
> > > > hive
> > > > >> compatibility.
> > > > >>
> > > > >> For SHOW JAR, I think the main concern is the jars are not
> > maintained
> > > by
> > > > >> the Catalog. If we really needs to keep consistent with SQL
> grammar,
> > > > maybe
> > > > >> we should use
> > > > >>
> > > > >> `ADD JAR` -> `CREATE JAR`,
> > > > >> `DELETE JAR` -> `DROP JAR`,
> > > > >> `LIST JAR` -> `SHOW JAR`.
> > > > >>
> > > > >> *Regarding #5*: I agree with you that we'd better keep consistent.
> > > > >>
> > > > >> *Regarding #6*: Yes. Most of the commands should belong to the
> table
> > > > >> environment. In the Summary section, I use the <NOTE> tag to
> > identify
> > > > which
> > > > >> commands should belong to the sql client and which commands should
> > > > belong
> > > > >> to the table environment. I also add a new section about
> > > implementation
> > > > >> details in the FLIP.
> > > > >>
> > > > >> Best,
> > > > >> Shengkai
> > > > >>
> > > > >> Timo Walther <[email protected]> 于2021年2月2日周二 下午6:43写道：
> > > > >>
> > > > >>> Thanks for this great proposal Shengkai. This will give the SQL
> > > Client
> > > > a
> > > > >>> very good update and make it production ready.
> > > > >>>
> > > > >>> Here is some feedback from my side:
> > > > >>>
> > > > >>> 1) SQL client specific options
> > > > >>>
> > > > >>> I don't think that `sql-client.planner` and
> > > `sql-client.execution.mode`
> > > > >>> are SQL Client specific. Similar to `StreamExecutionEnvironment`
> > and
> > > > >>> `ExecutionConfig#configure` that have been added recently, we
> > should
> > > > >>> offer a possibility for TableEnvironment. How about we offer
> > > > >>> `TableEnvironment.create(ReadableConfig)` and add a
> `table.planner`
> > > and
> > > > >>> `table.execution-mode` to
> > > > >>> `org.apache.flink.table.api.config.TableConfigOptions`?
> > > > >>>
> > > > >>> 2) Execution file
> > > > >>>
> > > > >>> Did you have a look at the Appendix of FLIP-84 [1] including the
> > > > mailing
> > > > >>> list thread at that time? Could you further elaborate how the
> > > > >>> multi-statement execution should work for a unified
> batch/streaming
> > > > >>> story? According to our past discussions, each line in an
> execution
> > > > file
> > > > >>> should be executed blocking which means a streaming query needs a
> > > > >>> statement set to execute multiple INSERT INTO statement, correct?
> > We
> > > > >>> should also offer this functionality in
> > > > >>> `TableEnvironment.executeMultiSql()`. Whether
> > `sql-client.job.detach`
> > > > is
> > > > >>> SQL Client specific needs to be determined, it could also be a
> > > general
> > > > >>> `table.multi-sql-async` option?
> > > > >>>
> > > > >>> 3) DELETE JAR
> > > > >>>
> > > > >>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like
> > one
> > > > is
> > > > >>> actively deleting the JAR in the corresponding path.
> > > > >>>
> > > > >>> 4) LIST JAR
> > > > >>>
> > > > >>> This should be `SHOW JARS` according to other SQL commands such
> as
> > > > `SHOW
> > > > >>> CATALOGS`, `SHOW TABLES`, etc. [2].
> > > > >>>
> > > > >>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> > > > >>>
> > > > >>> We should keep the details in sync with
> > > > >>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion
> > about
> > > > >>> differently named ExplainDetails. I would vote for
> `ESTIMATED_COST`
> > > > >>> instead of `COST`. I'm sure the original author had a reason why
> to
> > > > call
> > > > >>> it that way.
> > > > >>>
> > > > >>> 6) Implementation details
> > > > >>>
> > > > >>> It would be nice to understand how we plan to implement the given
> > > > >>> features. Most of the commands and config options should go into
> > > > >>> TableEnvironment and SqlParser directly, correct? This way users
> > > have a
> > > > >>> unified way of using Flink SQL. TableEnvironment would provide a
> > > > similar
> > > > >>> user experience in notebooks or interactive programs than the SQL
> > > > Client.
> > > > >>>
> > > > >>> [1]
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > > > >>> [2]
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> > > > >>>
> > > > >>> Regards,
> > > > >>> Timo
> > > > >>>
> > > > >>>
> > > > >>> On 02.02.21 10:13, Shengkai Fang wrote:
> > > > >>>> Sorry for the typo. I mean `RESET` is much better rather than
> > > `UNSET`.
> > > > >>>>
> > > > >>>> Shengkai Fang <[email protected]> 于2021年2月2日周二 下午4:44写道：
> > > > >>>>
> > > > >>>>> Hi, Jingsong.
> > > > >>>>>
> > > > >>>>> Thanks for your reply. I think `UNSET` is much better.
> > > > >>>>>
> > > > >>>>> 1. We don't need to introduce another command `UNSET`. `RESET`
> is
> > > > >>>>> supported in the current sql client now. Our proposal just
> > extends
> > > > its
> > > > >>>>> grammar and allow users to reset the specified keys.
> > > > >>>>> 2. Hive beeline also uses `RESET` to set the key to the default
> > > > >>> value[1].
> > > > >>>>> I think it is more friendly for batch users.
> > > > >>>>>
> > > > >>>>> Best,
> > > > >>>>> Shengkai
> > > > >>>>>
> > > > >>>>> [1]
> > > > >>>
> > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> > > > >>>>>
> > > > >>>>> Jingsong Li <[email protected]> 于2021年2月2日周二 下午1:56写道：
> > > > >>>>>
> > > > >>>>>> Thanks for the proposal, yes, sql-client is too outdated. +1
> for
> > > > >>>>>> improving it.
> > > > >>>>>>
> > > > >>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> > > > >>>>>>
> > > > >>>>>> Best,
> > > > >>>>>> Jingsong
> > > > >>>>>>
> > > > >>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <[email protected]>
> > > > wrote:
> > > > >>>>>>
> > > > >>>>>>> Thanks Shengkai for the update! The proposed changes look
> good
> > to
> > > > >> me.
> > > > >>>>>>>
> > > > >>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
> > [email protected]
> > > >
> > > > >>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hi, Rui.
> > > > >>>>>>>> You are right. I have already modified the FLIP.
> > > > >>>>>>>>
> > > > >>>>>>>> The main changes:
> > > > >>>>>>>>
> > > > >>>>>>>> # -f parameter has no restriction about the statement type.
> > > > >>>>>>>> Sometimes, users use the pipe to redirect the result of
> > queries
> > > to
> > > > >>>>>>> debug
> > > > >>>>>>>> when submitting job by -f parameter. It's much convenient
> > > > comparing
> > > > >>> to
> > > > >>>>>>>> writing INSERT INTO statements.
> > > > >>>>>>>>
> > > > >>>>>>>> # Add a new sql client option `sql-client.job.detach` .
> > > > >>>>>>>> Users prefer to execute jobs one by one in the batch mode.
> > Users
> > > > >> can
> > > > >>>>>>> set
> > > > >>>>>>>> this option false and the client will process the next job
> > until
> > > > >> the
> > > > >>>>>>>> current job finishes. The default value of this option is
> > false,
> > > > >>> which
> > > > >>>>>>>> means the client will execute the next job when the current
> > job
> > > is
> > > > >>>>>>>> submitted.
> > > > >>>>>>>>
> > > > >>>>>>>> Best,
> > > > >>>>>>>> Shengkai
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> Rui Li <[email protected]> 于2021年1月29日周五 下午4:52写道：
> > > > >>>>>>>>
> > > > >>>>>>>>> Hi Shengkai,
> > > > >>>>>>>>>
> > > > >>>>>>>>> Regarding #2, maybe the -f options in flink and hive have
> > > > >> different
> > > > >>>>>>>>> implications, and we should clarify the behavior. For
> > example,
> > > if
> > > > >>> the
> > > > >>>>>>>>> client just submits the job and exits, what happens if the
> > file
> > > > >>>>>>> contains
> > > > >>>>>>>>> two INSERT statements? I don't think we should treat them
> as
> > a
> > > > >>>>>>> statement
> > > > >>>>>>>>> set, because users should explicitly write BEGIN STATEMENT
> > SET
> > > in
> > > > >>> that
> > > > >>>>>>>>> case. And the client shouldn't asynchronously submit the
> two
> > > > jobs,
> > > > >>>>>>> because
> > > > >>>>>>>>> the 2nd may depend on the 1st, right?
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
> > > [email protected]
> > > > >
> > > > >>>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> Hi Rui,
> > > > >>>>>>>>>> Thanks for your feedback. I agree with your suggestions.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the
> set
> > > > >>>>>>> command. In
> > > > >>>>>>>>>> the implementation, it will just put the key-value into
> the
> > > > >>>>>>>>>> `Configuration`, which will be used to generate the table
> > > > config.
> > > > >>> If
> > > > >>>>>>> hive
> > > > >>>>>>>>>> supports to read the setting from the table config, users
> > are
> > > > >> able
> > > > >>>>>>> to set
> > > > >>>>>>>>>> the hive-related settings.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> For the suggestion 2: The -f parameter will submit the job
> > and
> > > > >>> exit.
> > > > >>>>>>> If
> > > > >>>>>>>>>> the queries never end, users have to cancel the job by
> > > > >> themselves,
> > > > >>>>>>> which is
> > > > >>>>>>>>>> not reliable(people may forget their jobs). In most case,
> > > > queries
> > > > >>>>>>> are used
> > > > >>>>>>>>>> to analyze the data. Users should use queries in the
> > > interactive
> > > > >>>>>>> mode.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Best,
> > > > >>>>>>>>>> Shengkai
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Rui Li <[email protected]> 于2021年1月29日周五 下午3:18写道：
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I think
> it
> > > > >>> covers a
> > > > >>>>>>>>>>> lot of useful features which will dramatically improve
> the
> > > > >>>>>>> usability of our
> > > > >>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> 1. Do you think we can let users set arbitrary
> > configurations
> > > > >> via
> > > > >>>>>>> the
> > > > >>>>>>>>>>> SET command? A connector may have its own configurations
> > and
> > > we
> > > > >>>>>>> don't have
> > > > >>>>>>>>>>> a way to dynamically change such configurations in SQL
> > > Client.
> > > > >> For
> > > > >>>>>>> example,
> > > > >>>>>>>>>>> users may want to be able to change hive conf when using
> > hive
> > > > >>>>>>> connector [1].
> > > > >>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL files
> > > > >> specified
> > > > >>>>>>> with
> > > > >>>>>>>>>>> the -f option? Hive supports a similar -f option but
> allows
> > > > >>> queries
> > > > >>>>>>> in the
> > > > >>>>>>>>>>> file. And a common use case is to run some query and
> > redirect
> > > > >> the
> > > > >>>>>>> results
> > > > >>>>>>>>>>> to a file. So I think maybe flink users would like to do
> > the
> > > > >> same,
> > > > >>>>>>>>>>> especially in batch scenarios.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> > > > >>>>>>> [email protected]>
> > > > >>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> Hi Shengkai,
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Glad to see this improvement. And I have some additional
> > > > >>>>>>> suggestions:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> > > > >>>>>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
> > > > >>>>>>>>>>>> #2. Improve the way of results retrieval: sql client
> > collect
> > > > >> the
> > > > >>>>>>>>>>>> results
> > > > >>>>>>>>>>>> locally all at once using accumulators at present,
> > > > >>>>>>>>>>>>         which may have memory issues in JM or Local for
> > the
> > > > big
> > > > >>> query
> > > > >>>>>>>>>>>> result.
> > > > >>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> > > > >>>>>>>>>>>>         We may change to use SelectTableSink, which is
> > based
> > > > >>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> > > > >>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in
> > > > >> FLIP-91.
> > > > >>>>>>> Seems
> > > > >>>>>>>>>>>> that this FLIP has not moved forward for a long time.
> > > > >>>>>>>>>>>>         Provide a long running service out of the box to
> > > > >>> facilitate
> > > > >>>>>>> the
> > > > >>>>>>>>>>>> sql
> > > > >>>>>>>>>>>> submission is necessary.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> What do you think of these?
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> [1]
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Shengkai Fang <[email protected]> 于2021年1月28日周四
> 下午8:54写道：
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Hi devs,
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Jark and I want to start a discussion about
> FLIP-163:SQL
> > > > >> Client
> > > > >>>>>>>>>>>>> Improvements.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Many users have complained about the problems of the
> sql
> > > > >> client.
> > > > >>>>>>> For
> > > > >>>>>>>>>>>>> example, users can not register the table proposed by
> > > > FLIP-95.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> The main changes in this FLIP:
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> - use -i parameter to specify the sql file to
> initialize
> > > the
> > > > >>>>>>> table
> > > > >>>>>>>>>>>>> environment and deprecated YAML file;
> > > > >>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
> > parameter;
> > > > >>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> > > > >>>>>>>>>>>>> - support statement set syntax;
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Look forward to your feedback.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Best,
> > > > >>>>>>>>>>>>> Shengkai
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> [1]
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> --
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> *With kind regards
> > > > >>>>>>>>>>>>
> > ------------------------------------------------------------
> > > > >>>>>>>>>>>> Sebastian Liu 刘洋
> > > > >>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of
> > > Science
> > > > >>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> > > > >>>>>>>>>>>> E-mail: [email protected] <[email protected]>
> > > > >>>>>>>>>>>> QQ: 3239559*
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> --
> > > > >>>>>>>>>>> Best regards!
> > > > >>>>>>>>>>> Rui Li
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> --
> > > > >>>>>>>>> Best regards!
> > > > >>>>>>>>> Rui Li
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> Best regards!
> > > > >>>>>>> Rui Li
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> --
> > > > >>>>>> Best, Jingsong Lee
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>
> > > > >
> > > >
> > > >
> > >
> > > --
> > > Best regards!
> > > Rui Li
> > >
> >
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Reply via email to