Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

Paul Lam Thu, 05 May 2022 23:21:16 -0700

I had a look at SQL-2016 that Martijn mentioned, and found that
maybe we could follow the transaction savepoint syntax.
SAVEPOINT <savepoint specifier>
RELEASE SAVEPOINT <savepoint specifier>
These savepoint statements are supported in lots of databases, like 
Oracle[1], PG[2], MariaDB[3].


They’re usually used in the middle of a SQL transaction, so the target 
would be the current transaction. But if used in Flink SQL session, we 
need to add a JOB/QUERY id when create a savepoint, thus the syntax 
would be:
SAVEPOINT <job/query id> <savepoint path>
RELEASE SAVEPOINT <savepoint path>
I’m adding it as an alternative in the FLIP.

[1] https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10001.htm 
<https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10001.htm>
[2] https://www.postgresql.org/docs/current/sql-savepoint.html
[3] https://mariadb.com/kb/en/savepoint/

Best,
Paul Lam

> 2022年5月4日 16:42，Paul Lam <paullin3...@gmail.com> 写道：
> 
> Hi Shengkai,
> 
> Thanks a lot for your input!
> 
> > I just wonder how the users can get the web ui in the application mode.
> Therefore, it's better we can list the Web UI using the SHOW statement.
> WDYT?
> 
> I think it's a valid approach. I'm adding it to the FLIP.
> 
> > After the investigation, I am fine with the QUERY but the keyword JOB is
> also okay to me.
> 
> In addition, CockroachDB has both SHOW QUERIES [1] and SHOW JOBS [2],
> while the former shows the active running queries and the latter shows the 
> background tasks like schema changes. FYI.
> 
> WRT the questions:
> 
> > 1. Could you add some details about the behaviour with the different
> execution.target, e.g. session, application mode?
> 
> IMHO, the difference between different `execution.target` is mostly about
> cluster startup, which has little relation with the proposed statements.
> These statements rely on the current ClusterClient/JobClient API, 
> which is deployment mode agnostic. Canceling a job in an application 
> cluster is the same as in a session cluster. 
> 
> BTW, application mode is still in the development progress ATM [3].
> 
> > 2. Considering the SQL Client/Gateway is not limited to submitting the job
> to the specified cluster, is it able to list jobs in the other clusters?
> 
> I think multi-cluster support in SQL Client/Gateway should be aligned with
> CLI, at least at the early phase. We may use SET  to set a cluster id for a 
> session, then we have access to the cluster. However,  every SHOW 
> statement would only involve one cluster.
> 
> Best,
> Paul Lam
> 
> [1] https://www.cockroachlabs.com/docs/stable/show-statements.html 
> <https://www.cockroachlabs.com/docs/stable/show-statements.html>
> [2] https://www.cockroachlabs.com/docs/v21.2/show-jobs 
> <https://www.cockroachlabs.com/docs/v21.2/show-jobs>
> [3] https://issues.apache.org/jira/browse/FLINK-26541 
> <https://issues.apache.org/jira/browse/FLINK-26541>
> Shengkai Fang <fskm...@gmail.com <mailto:fskm...@gmail.com>> 于2022年4月29日周五 
> 15:36写道：
> Hi.
> 
> Thanks for Paul's update.
> 
> > It's better we can also get the infos about the cluster where the job is
> > running through the DESCRIBE statement.
> 
> I just wonder how the users can get the web ui in the application mode.
> Therefore, it's better we can list the Web UI using the SHOW statement.
> WDYT?
> 
> 
> > QUERY or other keywords.
> 
> I list the statement to manage the lifecycle of the query/dml in other
> systems:
> 
> Mysql[1] allows users to SHOW [FULL] PROCESSLIST and use the KILL command
> to kill the query.
> 
> ```
> mysql> SHOW PROCESSLIST;
> 
> mysql> KILL 27;
> ```
> 
> 
> Postgres use the following statements to kill the queries.
> 
> ```
> SELECT pg_cancel_backend(<pid of the process>)
> 
> SELECT pg_terminate_backend(<pid of the process>)
> ```
> 
> KSQL uses the following commands to control the query lifecycle[4].
> 
> ```
> SHOW QUERIES;
> 
> TERMINATE <query id>;
> 
> ```
> 
> [1] https://dev.mysql.com/doc/refman/8.0/en/show-processlist.html 
> <https://dev.mysql.com/doc/refman/8.0/en/show-processlist.html>
> [2] https://scaledynamix.com/blog/how-to-kill-mysql-queries/ 
> <https://scaledynamix.com/blog/how-to-kill-mysql-queries/>
> [3]
> https://stackoverflow.com/questions/35319597/how-to-stop-kill-a-query-in-postgresql
>  
> <https://stackoverflow.com/questions/35319597/how-to-stop-kill-a-query-in-postgresql>
> [4]
> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/
>  
> <https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/>
> [5]
> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/terminate/ 
> <https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/terminate/>
> 
> After the investigation, I am fine with the QUERY but the keyword JOB is
> also okay to me.
> 
> We also have two questions here.
> 
> 1. Could you add some details about the behaviour with the different
> execution.target, e.g. session, application mode?
> 
> 2. Considering the SQL Client/Gateway is not limited to submitting the job
> to the specified cluster, is it able to list jobs in the other clusters?
> 
> 
> Best,
> Shengkai
> 
> Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com>> 于2022年4月28日周四 
> 17:17写道：
> 
> > Hi Martjin,
> >
> > Thanks a lot for your reply! I agree that the scope may be a bit confusing,
> > please let me clarify.
> >
> > The FLIP aims to add new SQL statements that are supported only in
> > sql-client, similar to
> > jar statements [1]. Jar statements can be parsed into jar operations, which
> > are used only in
> > CliClient in sql-client module and cannot be executed by TableEnvironment
> > (not available in
> > Table API program that contains SQL that you mentioned).
> >
> > WRT the unchanged CLI client, I mean CliClient instead of the sql-client
> > module, which
> > currently contains the gateway codes (e.g. Executor). The FLIP mainly
> > extends
> > the gateway part, and barely touches CliClient and REST server (REST
> > endpoint in FLIP-91).
> >
> > WRT the syntax, I don't have much experience with SQL standards, and I'd
> > like to hear
> > more opinions from the community. I prefer Hive-style syntax because I
> > think many users
> > are familiar with Hive, and there're on-going efforts to improve Flink-Hive
> > integration [2][3].
> > But my preference is not strong, I'm okay with other options too. Do you
> > think JOB/Task is
> > a good choice, or do you have other preferred keywords?
> >
> > [1]
> >
> > https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/jar/
> >  
> > <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/jar/>
> > [2]
> >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
> >  
> > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility>
> > [3]
> >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-223%3A+Support+HiveServer2+Endpoint
> >  
> > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-223%3A+Support+HiveServer2+Endpoint>
> >
> > Best,
> > Paul Lam
> >
> > Martijn Visser <martijnvis...@apache.org <mailto:martijnvis...@apache.org>> 
> > 于2022年4月26日周二 20:14写道：
> >
> > > Hi Paul,
> > >
> > > Thanks for creating the FLIP and opening the discussion. I did get a bit
> > > confused about the title, being "query lifecycle statements in SQL
> > client".
> > > This sounds like you want to adopt the SQL client, but you want to expand
> > > the SQL syntax with lifecycle statements, which could be used from the
> > SQL
> > > client, but of course also in a Table API program that contains SQL.
> > GIven
> > > that you're highlighting the CLI client as unchanged, this adds to more
> > > confusion.
> > >
> > > I am interested if there's anything listed in the SQL 2016 standard on
> > > these types of lifecycle statements. I did a quick scan for "SHOW
> > QUERIES"
> > > but couldn't find it. It would be great if we could stay as close as
> > > possible to such syntax. Overall I'm not in favour of using QUERIES as a
> > > keyword. I think Flink applications are not queries, but short- or long
> > > running applications. Why should we follow Hive's setup and indeed not
> > > others such as Snowflake, but also Postgres or MySQL?
> > >
> > > Best regards,
> > >
> > > Martijn Visser
> > > https://twitter.com/MartijnVisser82 <https://twitter.com/MartijnVisser82>
> > > https://github.com/MartijnVisser <https://github.com/MartijnVisser>
> > >
> > >
> > > On Fri, 22 Apr 2022 at 12:06, Paul Lam <paullin3...@gmail.com 
> > > <mailto:paullin3...@gmail.com>> wrote:
> > >
> > > > Hi Shengkai,
> > > >
> > > > Thanks a lot for your opinions!
> > > >
> > > > > 1. I think the keyword QUERY may confuse users because the statement
> > > also
> > > > > works for the DML statement.
> > > >
> > > > I slightly lean to QUERY, because:
> > > >
> > > > Hive calls DMLs queries. We could be better aligned with Hive using
> > > QUERY,
> > > > especially given that we plan to introduce Hive endpoint.
> > > > QUERY is a more SQL-like concept and friendly to SQL users.
> > > >
> > > > In general, my preference: QUERY > JOB > TASK. I’m okay with JOB, but
> > not
> > > > very good with TASK, as it conflicts with the task concept in Flink
> > > runtime.
> > > >
> > > > We could wait for more feedbacks from the community.
> > > >
> > > > > 2. STOP/CANCEL is not very straightforward for the SQL users to
> > > terminate
> > > > > their jobs.
> > > >
> > > > Agreed. I’m okay with DROP. And if we want to align with Hive, KILL
> > might
> > > > an alternative.
> > > >
> > > > > 3. I think CREATE/DROP SAVEPOINTS statement is more SQL-like.
> > > >
> > > > Agreed. It’s more SQL-like and intuitive. I’m updating the syntax on
> > the
> > > > FLIP.
> > > >
> > > > > 4. SHOW TASKS can just list the job id and use the DESCRIPE to get
> > more
> > > > > detailed job infos.
> > > >
> > > > That is a more SQL-like approach I think. But considering the
> > > > ClusterClient APIs, we can fetch the names and the status along in one
> > > > request,
> > > > thus it may be more user friendly to return them all in the SHOW
> > > > statement?
> > > >
> > > > > It's better we can also get the infos about the cluster where the job
> > > is
> > > > > running on through the DESCRIBE statement.
> > > >
> > > > I think cluster info could be part of session properties instead. WDYT?
> > > >
> > > > Best,
> > > > Paul Lam
> > > >
> > > > > 2022年4月22日 11:14，Shengkai Fang <fskm...@gmail.com 
> > > > > <mailto:fskm...@gmail.com>> 写道：
> > > > >
> > > > > Hi Paul
> > > > >
> > > > > Sorry for the late response. I propose my thoughts here.
> > > > >
> > > > > 1. I think the keyword QUERY may confuse users because the statement
> > > also
> > > > > works for the DML statement. I find the Snowflakes[1] supports
> > > > >
> > > > > - CREATE TASK
> > > > > - DROP TASK
> > > > > - ALTER TASK
> > > > > - SHOW TASKS
> > > > > - DESCRIPE TASK
> > > > >
> > > > > I think we can follow snowflake to use `TASK` as the keyword or use
> > the
> > > > > keyword `JOB`?
> > > > >
> > > > > 2. STOP/CANCEL is not very straightforward for the SQL users to
> > > terminate
> > > > > their jobs.
> > > > >
> > > > > ```
> > > > > DROP TASK [IF EXISTS] <job id> PURGE; -- Forcely stop the job with
> > > drain
> > > > >
> > > > > DROP TASK [IF EXISTS] <job id>; -- Stop the task with savepoints
> > > > > ```
> > > > >
> > > > > Oracle[2] uses the PURGE to clean up the table and users can't not
> > > > recover.
> > > > > I think it also works for us to terminate the job permanently.
> > > > >
> > > > > 3. I think CREATE/DROP SAVEPOINTS statement is more SQL-like. Users
> > can
> > > > use
> > > > > the
> > > > >
> > > > > ```
> > > > >  SET 'state.savepoints.dir' = '<path_to_savepoint>';
> > > > >  SET 'state.savepoints.fomat' = 'native';
> > > > >  CREATE SAVEPOINT <job id>;
> > > > >
> > > > >  DROP SAVEPOINT <path_to_savepoint>;
> > > > > ```
> > > > >
> > > > > 4. SHOW TASKS can just list the job id and use the DESCRIPE to get
> > more
> > > > > detailed job infos.
> > > > >
> > > > > ```
> > > > >
> > > > > SHOW TASKS;
> > > > >
> > > > >
> > > > > +----------------------------------+
> > > > > |            job_id                |
> > > > > +----------------------------------+
> > > > > | 0f6413c33757fbe0277897dd94485f04 |
> > > > > +----------------------------------+
> > > > >
> > > > > DESCRIPE TASK <job id>;
> > > > >
> > > > > +------------------------
> > > > > |  job name   | status  |
> > > > > +------------------------
> > > > > | insert-sink | running |
> > > > > +------------------------
> > > > >
> > > > > ```
> > > > > It's better we can also get the infos about the cluster where the job
> > > is
> > > > > running on through the DESCRIBE statement.
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> > https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management
> >  
> > <https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management>
> > > > <
> > > >
> > >
> > https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management
> >  
> > <https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management>
> > > > >
> > > > > [2]
> > > > >
> > > >
> > >
> > https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806
> >  
> > <https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806>
> > > > <
> > > >
> > >
> > https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806
> >  
> > <https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806>
> > > > >
> > > > >
> > > > > Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com> 
> > > > > <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>
> > > > 于2022年4月21日周四 10:36写道：
> > > > >
> > > > >> ping @Timo @Jark @Shengkai
> > > > >>
> > > > >> Best,
> > > > >> Paul Lam
> > > > >>
> > > > >>> 2022年4月18日 17:12，Paul Lam <paullin3...@gmail.com 
> > > > >>> <mailto:paullin3...@gmail.com>> 写道：
> > > > >>>
> > > > >>> Hi team,
> > > > >>>
> > > > >>> I’d like to start a discussion about FLIP-222 [1], which adds query
> > > > >> lifecycle
> > > > >>> statements to SQL client.
> > > > >>>
> > > > >>> Currently, SQL client supports submitting queries (queries in a
> > broad
> > > > >> sense,
> > > > >>> including DQLs and DMLs) but no further lifecycle statements, like
> > > > >> canceling
> > > > >>> a query or triggering a savepoint. That makes SQL users have to
> > rely
> > > on
> > > > >>> CLI or REST API to manage theirs queries.
> > > > >>>
> > > > >>> Thus, I propose to introduce the following statements to fill the
> > > gap.
> > > > >>> SHOW QUERIES
> > > > >>> STOP QUERY <query_id>
> > > > >>> CANCEL QUERY <query_id>
> > > > >>> TRIGGER SAVEPOINT <savepoint_path>
> > > > >>> DISPOSE SAVEPOINT <savepoint_path>
> > > > >>> These statement would align SQL client with CLI, providing the full
> > > > >> lifecycle
> > > > >>> management for queries/jobs.
> > > > >>>
> > > > >>> Please see the FLIP page[1] for more details. Thanks a lot!
> > > > >>> (For reference, the previous discussion thread see [2].)
> > > > >>>
> > > > >>> [1]
> > > > >>
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-222%3A+Support+full+query+lifecycle+statements+in+SQL+client
> >  
> > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-222%3A+Support+full+query+lifecycle+statements+in+SQL+client>
> > > > >> <
> > > > >>
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client
> >  
> > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client>
> > > > <
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client
> >  
> > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client>
> > > > >
> > > > >>>
> > > > >>> [2]
> > https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb 
> > <https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb>
> > > <
> > > > https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb 
> > > > <https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb>> <
> > > > >> https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb 
> > > > >> <https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb> <
> > > > https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb 
> > > > <https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb>>>
> > > > >>>
> > > > >>> Best,
> > > > >>> Paul Lam
> > > >
> > > >
> > >
> >

Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

Reply via email to