Hi Paul, Great that you could find something in the SQL standard! I'll try to read the FLIP once more completely next week to see if I have any more concerns.
Best regards, Martijn On Fri, 6 May 2022 at 08:21, Paul Lam <paullin3...@gmail.com> wrote: > I had a look at SQL-2016 that Martijn mentioned, and found that > maybe we could follow the transaction savepoint syntax. > > - SAVEPOINT <savepoint specifier> > - RELEASE SAVEPOINT <savepoint specifier> > > These savepoint statements are supported in lots of databases, like > Oracle[1], PG[2], MariaDB[3]. > > They’re usually used in the middle of a SQL transaction, so the target > would be the current transaction. But if used in Flink SQL session, we > need to add a JOB/QUERY id when create a savepoint, thus the syntax > would be: > > - SAVEPOINT <job/query id> <savepoint path> > - RELEASE SAVEPOINT <savepoint path> > > I’m adding it as an alternative in the FLIP. > > [1] > https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10001.htm > [2] https://www.postgresql.org/docs/current/sql-savepoint.html > [3] https://mariadb.com/kb/en/savepoint/ > > Best, > Paul Lam > > 2022年5月4日 16:42,Paul Lam <paullin3...@gmail.com> 写道: > > Hi Shengkai, > > Thanks a lot for your input! > > > I just wonder how the users can get the web ui in the application mode. > Therefore, it's better we can list the Web UI using the SHOW statement. > WDYT? > > I think it's a valid approach. I'm adding it to the FLIP. > > > After the investigation, I am fine with the QUERY but the keyword JOB is > also okay to me. > > In addition, CockroachDB has both SHOW QUERIES [1] and SHOW JOBS [2], > while the former shows the active running queries and the latter shows the > background tasks like schema changes. FYI. > > WRT the questions: > > > 1. Could you add some details about the behaviour with the different > execution.target, e.g. session, application mode? > > IMHO, the difference between different `execution.target` is mostly about > cluster startup, which has little relation with the proposed statements. > These statements rely on the current ClusterClient/JobClient API, > which is deployment mode agnostic. Canceling a job in an application > cluster is the same as in a session cluster. > > BTW, application mode is still in the development progress ATM [3]. > > > 2. Considering the SQL Client/Gateway is not limited to submitting the > job > to the specified cluster, is it able to list jobs in the other clusters? > > I think multi-cluster support in SQL Client/Gateway should be aligned with > CLI, at least at the early phase. We may use SET to set a cluster id for > a > session, then we have access to the cluster. However, every SHOW > statement would only involve one cluster. > > Best, > Paul Lam > > [1] https://www.cockroachlabs.com/docs/stable/show-statements.html > [2] https://www.cockroachlabs.com/docs/v21.2/show-jobs > [3] https://issues.apache.org/jira/browse/FLINK-26541 > > Shengkai Fang <fskm...@gmail.com> 于2022年4月29日周五 15:36写道: > >> Hi. >> >> Thanks for Paul's update. >> >> > It's better we can also get the infos about the cluster where the job is >> > running through the DESCRIBE statement. >> >> I just wonder how the users can get the web ui in the application mode. >> Therefore, it's better we can list the Web UI using the SHOW statement. >> WDYT? >> >> >> > QUERY or other keywords. >> >> I list the statement to manage the lifecycle of the query/dml in other >> systems: >> >> Mysql[1] allows users to SHOW [FULL] PROCESSLIST and use the KILL command >> to kill the query. >> >> ``` >> mysql> SHOW PROCESSLIST; >> >> mysql> KILL 27; >> ``` >> >> >> Postgres use the following statements to kill the queries. >> >> ``` >> SELECT pg_cancel_backend(<pid of the process>) >> >> SELECT pg_terminate_backend(<pid of the process>) >> ``` >> >> KSQL uses the following commands to control the query lifecycle[4]. >> >> ``` >> SHOW QUERIES; >> >> TERMINATE <query id>; >> >> ``` >> >> [1] https://dev.mysql.com/doc/refman/8.0/en/show-processlist.html >> [2] https://scaledynamix.com/blog/how-to-kill-mysql-queries/ >> [3] >> >> https://stackoverflow.com/questions/35319597/how-to-stop-kill-a-query-in-postgresql >> [4] >> >> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/ >> [5] >> >> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/terminate/ >> >> After the investigation, I am fine with the QUERY but the keyword JOB is >> also okay to me. >> >> We also have two questions here. >> >> 1. Could you add some details about the behaviour with the different >> execution.target, e.g. session, application mode? >> >> 2. Considering the SQL Client/Gateway is not limited to submitting the job >> to the specified cluster, is it able to list jobs in the other clusters? >> >> >> Best, >> Shengkai >> >> Paul Lam <paullin3...@gmail.com> 于2022年4月28日周四 17:17写道: >> >> > Hi Martjin, >> > >> > Thanks a lot for your reply! I agree that the scope may be a bit >> confusing, >> > please let me clarify. >> > >> > The FLIP aims to add new SQL statements that are supported only in >> > sql-client, similar to >> > jar statements [1]. Jar statements can be parsed into jar operations, >> which >> > are used only in >> > CliClient in sql-client module and cannot be executed by >> TableEnvironment >> > (not available in >> > Table API program that contains SQL that you mentioned). >> > >> > WRT the unchanged CLI client, I mean CliClient instead of the sql-client >> > module, which >> > currently contains the gateway codes (e.g. Executor). The FLIP mainly >> > extends >> > the gateway part, and barely touches CliClient and REST server (REST >> > endpoint in FLIP-91). >> > >> > WRT the syntax, I don't have much experience with SQL standards, and I'd >> > like to hear >> > more opinions from the community. I prefer Hive-style syntax because I >> > think many users >> > are familiar with Hive, and there're on-going efforts to improve >> Flink-Hive >> > integration [2][3]. >> > But my preference is not strong, I'm okay with other options too. Do you >> > think JOB/Task is >> > a good choice, or do you have other preferred keywords? >> > >> > [1] >> > >> > >> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/jar/ >> > [2] >> > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility >> > [3] >> > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-223%3A+Support+HiveServer2+Endpoint >> > >> > Best, >> > Paul Lam >> > >> > Martijn Visser <martijnvis...@apache.org> 于2022年4月26日周二 20:14写道: >> > >> > > Hi Paul, >> > > >> > > Thanks for creating the FLIP and opening the discussion. I did get a >> bit >> > > confused about the title, being "query lifecycle statements in SQL >> > client". >> > > This sounds like you want to adopt the SQL client, but you want to >> expand >> > > the SQL syntax with lifecycle statements, which could be used from the >> > SQL >> > > client, but of course also in a Table API program that contains SQL. >> > GIven >> > > that you're highlighting the CLI client as unchanged, this adds to >> more >> > > confusion. >> > > >> > > I am interested if there's anything listed in the SQL 2016 standard on >> > > these types of lifecycle statements. I did a quick scan for "SHOW >> > QUERIES" >> > > but couldn't find it. It would be great if we could stay as close as >> > > possible to such syntax. Overall I'm not in favour of using QUERIES >> as a >> > > keyword. I think Flink applications are not queries, but short- or >> long >> > > running applications. Why should we follow Hive's setup and indeed not >> > > others such as Snowflake, but also Postgres or MySQL? >> > > >> > > Best regards, >> > > >> > > Martijn Visser >> > > https://twitter.com/MartijnVisser82 >> > > https://github.com/MartijnVisser >> > > >> > > >> > > On Fri, 22 Apr 2022 at 12:06, Paul Lam <paullin3...@gmail.com> wrote: >> > > >> > > > Hi Shengkai, >> > > > >> > > > Thanks a lot for your opinions! >> > > > >> > > > > 1. I think the keyword QUERY may confuse users because the >> statement >> > > also >> > > > > works for the DML statement. >> > > > >> > > > I slightly lean to QUERY, because: >> > > > >> > > > Hive calls DMLs queries. We could be better aligned with Hive using >> > > QUERY, >> > > > especially given that we plan to introduce Hive endpoint. >> > > > QUERY is a more SQL-like concept and friendly to SQL users. >> > > > >> > > > In general, my preference: QUERY > JOB > TASK. I’m okay with JOB, >> but >> > not >> > > > very good with TASK, as it conflicts with the task concept in Flink >> > > runtime. >> > > > >> > > > We could wait for more feedbacks from the community. >> > > > >> > > > > 2. STOP/CANCEL is not very straightforward for the SQL users to >> > > terminate >> > > > > their jobs. >> > > > >> > > > Agreed. I’m okay with DROP. And if we want to align with Hive, KILL >> > might >> > > > an alternative. >> > > > >> > > > > 3. I think CREATE/DROP SAVEPOINTS statement is more SQL-like. >> > > > >> > > > Agreed. It’s more SQL-like and intuitive. I’m updating the syntax on >> > the >> > > > FLIP. >> > > > >> > > > > 4. SHOW TASKS can just list the job id and use the DESCRIPE to get >> > more >> > > > > detailed job infos. >> > > > >> > > > That is a more SQL-like approach I think. But considering the >> > > > ClusterClient APIs, we can fetch the names and the status along in >> one >> > > > request, >> > > > thus it may be more user friendly to return them all in the SHOW >> > > > statement? >> > > > >> > > > > It's better we can also get the infos about the cluster where the >> job >> > > is >> > > > > running on through the DESCRIBE statement. >> > > > >> > > > I think cluster info could be part of session properties instead. >> WDYT? >> > > > >> > > > Best, >> > > > Paul Lam >> > > > >> > > > > 2022年4月22日 11:14,Shengkai Fang <fskm...@gmail.com> 写道: >> > > > > >> > > > > Hi Paul >> > > > > >> > > > > Sorry for the late response. I propose my thoughts here. >> > > > > >> > > > > 1. I think the keyword QUERY may confuse users because the >> statement >> > > also >> > > > > works for the DML statement. I find the Snowflakes[1] supports >> > > > > >> > > > > - CREATE TASK >> > > > > - DROP TASK >> > > > > - ALTER TASK >> > > > > - SHOW TASKS >> > > > > - DESCRIPE TASK >> > > > > >> > > > > I think we can follow snowflake to use `TASK` as the keyword or >> use >> > the >> > > > > keyword `JOB`? >> > > > > >> > > > > 2. STOP/CANCEL is not very straightforward for the SQL users to >> > > terminate >> > > > > their jobs. >> > > > > >> > > > > ``` >> > > > > DROP TASK [IF EXISTS] <job id> PURGE; -- Forcely stop the job with >> > > drain >> > > > > >> > > > > DROP TASK [IF EXISTS] <job id>; -- Stop the task with savepoints >> > > > > ``` >> > > > > >> > > > > Oracle[2] uses the PURGE to clean up the table and users can't not >> > > > recover. >> > > > > I think it also works for us to terminate the job permanently. >> > > > > >> > > > > 3. I think CREATE/DROP SAVEPOINTS statement is more SQL-like. >> Users >> > can >> > > > use >> > > > > the >> > > > > >> > > > > ``` >> > > > > SET 'state.savepoints.dir' = '<path_to_savepoint>'; >> > > > > SET 'state.savepoints.fomat' = 'native'; >> > > > > CREATE SAVEPOINT <job id>; >> > > > > >> > > > > DROP SAVEPOINT <path_to_savepoint>; >> > > > > ``` >> > > > > >> > > > > 4. SHOW TASKS can just list the job id and use the DESCRIPE to get >> > more >> > > > > detailed job infos. >> > > > > >> > > > > ``` >> > > > > >> > > > > SHOW TASKS; >> > > > > >> > > > > >> > > > > +----------------------------------+ >> > > > > | job_id | >> > > > > +----------------------------------+ >> > > > > | 0f6413c33757fbe0277897dd94485f04 | >> > > > > +----------------------------------+ >> > > > > >> > > > > DESCRIPE TASK <job id>; >> > > > > >> > > > > +------------------------ >> > > > > | job name | status | >> > > > > +------------------------ >> > > > > | insert-sink | running | >> > > > > +------------------------ >> > > > > >> > > > > ``` >> > > > > It's better we can also get the infos about the cluster where the >> job >> > > is >> > > > > running on through the DESCRIBE statement. >> > > > > >> > > > > >> > > > > [1] >> > > > > >> > > > >> > > >> > >> https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management >> > > > < >> > > > >> > > >> > >> https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management >> > > > > >> > > > > [2] >> > > > > >> > > > >> > > >> > >> https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806 >> > > > < >> > > > >> > > >> > >> https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806 >> > > > > >> > > > > >> > > > > Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com>> >> > > > 于2022年4月21日周四 10:36写道: >> > > > > >> > > > >> ping @Timo @Jark @Shengkai >> > > > >> >> > > > >> Best, >> > > > >> Paul Lam >> > > > >> >> > > > >>> 2022年4月18日 17:12,Paul Lam <paullin3...@gmail.com> 写道: >> > > > >>> >> > > > >>> Hi team, >> > > > >>> >> > > > >>> I’d like to start a discussion about FLIP-222 [1], which adds >> query >> > > > >> lifecycle >> > > > >>> statements to SQL client. >> > > > >>> >> > > > >>> Currently, SQL client supports submitting queries (queries in a >> > broad >> > > > >> sense, >> > > > >>> including DQLs and DMLs) but no further lifecycle statements, >> like >> > > > >> canceling >> > > > >>> a query or triggering a savepoint. That makes SQL users have to >> > rely >> > > on >> > > > >>> CLI or REST API to manage theirs queries. >> > > > >>> >> > > > >>> Thus, I propose to introduce the following statements to fill >> the >> > > gap. >> > > > >>> SHOW QUERIES >> > > > >>> STOP QUERY <query_id> >> > > > >>> CANCEL QUERY <query_id> >> > > > >>> TRIGGER SAVEPOINT <savepoint_path> >> > > > >>> DISPOSE SAVEPOINT <savepoint_path> >> > > > >>> These statement would align SQL client with CLI, providing the >> full >> > > > >> lifecycle >> > > > >>> management for queries/jobs. >> > > > >>> >> > > > >>> Please see the FLIP page[1] for more details. Thanks a lot! >> > > > >>> (For reference, the previous discussion thread see [2].) >> > > > >>> >> > > > >>> [1] >> > > > >> >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-222%3A+Support+full+query+lifecycle+statements+in+SQL+client >> > > > >> < >> > > > >> >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client >> > > > < >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client >> > > > > >> > > > >>> >> > > > >>> [2] >> > https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb >> > > < >> > > > https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb> < >> > > > >> https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb >> < >> > > > https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb>> >> > > > >>> >> > > > >>> Best, >> > > > >>> Paul Lam >> > > > >> > > > >> > > >> > >> > >