Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

Paul Lam Thu, 28 Apr 2022 02:17:33 -0700

Hi Martjin,

Thanks a lot for your reply! I agree that the scope may be a bit confusing,
please let me clarify.


The FLIP aims to add new SQL statements that are supported only in
sql-client, similar to
jar statements [1]. Jar statements can be parsed into jar operations, which
are used only in
CliClient in sql-client module and cannot be executed by TableEnvironment
(not available in
Table API program that contains SQL that you mentioned).

WRT the unchanged CLI client, I mean CliClient instead of the sql-client
module, which
currently contains the gateway codes (e.g. Executor). The FLIP mainly
extends
the gateway part, and barely touches CliClient and REST server (REST
endpoint in FLIP-91).

WRT the syntax, I don't have much experience with SQL standards, and I'd
like to hear
more opinions from the community. I prefer Hive-style syntax because I
think many users
are familiar with Hive, and there're on-going efforts to improve Flink-Hive
integration [2][3].
But my preference is not strong, I'm okay with other options too. Do you
think JOB/Task is
a good choice, or do you have other preferred keywords?

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/jar/
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-223%3A+Support+HiveServer2+Endpoint

Best,
Paul Lam

Martijn Visser <martijnvis...@apache.org> 于2022年4月26日周二 20:14写道：

> Hi Paul,
>
> Thanks for creating the FLIP and opening the discussion. I did get a bit
> confused about the title, being "query lifecycle statements in SQL client".
> This sounds like you want to adopt the SQL client, but you want to expand
> the SQL syntax with lifecycle statements, which could be used from the SQL
> client, but of course also in a Table API program that contains SQL. GIven
> that you're highlighting the CLI client as unchanged, this adds to more
> confusion.
>
> I am interested if there's anything listed in the SQL 2016 standard on
> these types of lifecycle statements. I did a quick scan for "SHOW QUERIES"
> but couldn't find it. It would be great if we could stay as close as
> possible to such syntax. Overall I'm not in favour of using QUERIES as a
> keyword. I think Flink applications are not queries, but short- or long
> running applications. Why should we follow Hive's setup and indeed not
> others such as Snowflake, but also Postgres or MySQL?
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
>
>
> On Fri, 22 Apr 2022 at 12:06, Paul Lam <paullin3...@gmail.com> wrote:
>
> > Hi Shengkai,
> >
> > Thanks a lot for your opinions!
> >
> > > 1. I think the keyword QUERY may confuse users because the statement
> also
> > > works for the DML statement.
> >
> > I slightly lean to QUERY, because:
> >
> > Hive calls DMLs queries. We could be better aligned with Hive using
> QUERY,
> > especially given that we plan to introduce Hive endpoint.
> > QUERY is a more SQL-like concept and friendly to SQL users.
> >
> > In general, my preference: QUERY > JOB > TASK. I’m okay with JOB, but not
> > very good with TASK, as it conflicts with the task concept in Flink
> runtime.
> >
> > We could wait for more feedbacks from the community.
> >
> > > 2. STOP/CANCEL is not very straightforward for the SQL users to
> terminate
> > > their jobs.
> >
> > Agreed. I’m okay with DROP. And if we want to align with Hive, KILL might
> > an alternative.
> >
> > > 3. I think CREATE/DROP SAVEPOINTS statement is more SQL-like.
> >
> > Agreed. It’s more SQL-like and intuitive. I’m updating the syntax on the
> > FLIP.
> >
> > > 4. SHOW TASKS can just list the job id and use the DESCRIPE to get more
> > > detailed job infos.
> >
> > That is a more SQL-like approach I think. But considering the
> > ClusterClient APIs, we can fetch the names and the status along in one
> > request,
> > thus it may be more user friendly to return them all in the SHOW
> > statement?
> >
> > > It's better we can also get the infos about the cluster where the job
> is
> > > running on through the DESCRIBE statement.
> >
> > I think cluster info could be part of session properties instead. WDYT?
> >
> > Best,
> > Paul Lam
> >
> > > 2022年4月22日 11:14，Shengkai Fang <fskm...@gmail.com> 写道：
> > >
> > > Hi Paul
> > >
> > > Sorry for the late response. I propose my thoughts here.
> > >
> > > 1. I think the keyword QUERY may confuse users because the statement
> also
> > > works for the DML statement. I find the Snowflakes[1] supports
> > >
> > > - CREATE TASK
> > > - DROP TASK
> > > - ALTER TASK
> > > - SHOW TASKS
> > > - DESCRIPE TASK
> > >
> > > I think we can follow snowflake to use `TASK` as the keyword or use the
> > > keyword `JOB`?
> > >
> > > 2. STOP/CANCEL is not very straightforward for the SQL users to
> terminate
> > > their jobs.
> > >
> > > ```
> > > DROP TASK [IF EXISTS] <job id> PURGE; -- Forcely stop the job with
> drain
> > >
> > > DROP TASK [IF EXISTS] <job id>; -- Stop the task with savepoints
> > > ```
> > >
> > > Oracle[2] uses the PURGE to clean up the table and users can't not
> > recover.
> > > I think it also works for us to terminate the job permanently.
> > >
> > > 3. I think CREATE/DROP SAVEPOINTS statement is more SQL-like. Users can
> > use
> > > the
> > >
> > > ```
> > >  SET 'state.savepoints.dir' = '<path_to_savepoint>';
> > >  SET 'state.savepoints.fomat' = 'native';
> > >  CREATE SAVEPOINT <job id>;
> > >
> > >  DROP SAVEPOINT <path_to_savepoint>;
> > > ```
> > >
> > > 4. SHOW TASKS can just list the job id and use the DESCRIPE to get more
> > > detailed job infos.
> > >
> > > ```
> > >
> > > SHOW TASKS;
> > >
> > >
> > > +----------------------------------+
> > > |            job_id                |
> > > +----------------------------------+
> > > | 0f6413c33757fbe0277897dd94485f04 |
> > > +----------------------------------+
> > >
> > > DESCRIPE TASK <job id>;
> > >
> > > +------------------------
> > > |  job name   | status  |
> > > +------------------------
> > > | insert-sink | running |
> > > +------------------------
> > >
> > > ```
> > > It's better we can also get the infos about the cluster where the job
> is
> > > running on through the DESCRIBE statement.
> > >
> > >
> > > [1]
> > >
> >
> https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management
> > <
> >
> https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management
> > >
> > > [2]
> > >
> >
> https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806
> > <
> >
> https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806
> > >
> > >
> > > Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com>>
> > 于2022年4月21日周四 10:36写道：
> > >
> > >> ping @Timo @Jark @Shengkai
> > >>
> > >> Best,
> > >> Paul Lam
> > >>
> > >>> 2022年4月18日 17:12，Paul Lam <paullin3...@gmail.com> 写道：
> > >>>
> > >>> Hi team,
> > >>>
> > >>> I’d like to start a discussion about FLIP-222 [1], which adds query
> > >> lifecycle
> > >>> statements to SQL client.
> > >>>
> > >>> Currently, SQL client supports submitting queries (queries in a broad
> > >> sense,
> > >>> including DQLs and DMLs) but no further lifecycle statements, like
> > >> canceling
> > >>> a query or triggering a savepoint. That makes SQL users have to rely
> on
> > >>> CLI or REST API to manage theirs queries.
> > >>>
> > >>> Thus, I propose to introduce the following statements to fill the
> gap.
> > >>> SHOW QUERIES
> > >>> STOP QUERY <query_id>
> > >>> CANCEL QUERY <query_id>
> > >>> TRIGGER SAVEPOINT <savepoint_path>
> > >>> DISPOSE SAVEPOINT <savepoint_path>
> > >>> These statement would align SQL client with CLI, providing the full
> > >> lifecycle
> > >>> management for queries/jobs.
> > >>>
> > >>> Please see the FLIP page[1] for more details. Thanks a lot!
> > >>> (For reference, the previous discussion thread see [2].)
> > >>>
> > >>> [1]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-222%3A+Support+full+query+lifecycle+statements+in+SQL+client
> > >> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client
> > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client
> > >
> > >>>
> > >>> [2] https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb
> <
> > https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb> <
> > >> https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb <
> > https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb>>
> > >>>
> > >>> Best,
> > >>> Paul Lam
> >
> >
>

Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

Reply via email to