Hi Jark, Thanks a lot for your opinions and suggestions! Please see my replies inline.
> 1) the display of savepoint_path Agreed. Adding it to the FLIP. > 2) Please make a decision on multiple options in the FLIP. Okay. I’ll keep one and move the other to the rejected alternatives section. > 4) +1 SHOW QUERIES > Btw, the displayed column "address" is a little confusing to me. > At the first glance, I'm not sure what address it is, JM RPC address? JM REST > address? Gateway address? > If this is a link to the job's web UI URL, how about calling it "web_url" and > display in > "http://<hostname>:<port>" format? > Besides, how about displaying "startTime" or "uptime" as well? I’m good with these changes. Updating the FLIP according to your suggestions. > 5) STOP/CANCEL QUERY vs DROP QUERY > I'm +1 to DROP, because it's more compliant with SQL standard naming, i.e., > "SHOW/CREATE/DROP". > Separating STOP and CANCEL confuses users a lot what are the differences > between them. > I'm +1 to add the "PURGE" keyword to the DROP QUERY statement, which > indicates to stop query without savepoint. > Note that, PURGE doesn't mean stop with --drain flag. The drain flag will > flush all the registered timers > and windows which could lead to incorrect results when the job is resumed. I > think the drain flag is rarely used > (please correct me if I'm wrong), therefore, I suggest moving this feature > into future work when the needs are clear. I’m +1 to represent ungrateful cancel by PURGE. I think —drain flag is not used very often as you said, and we could just add a table config option to enable that flag. > 7) <query_id> and <savepoint_path> should be quoted > All the <query_id> and <savepoint_path> should be string literal, otherwise > it's hard to parse them. > For example, STOP QUERY '<query_id>’. Good point! Adding it to the FLIP. > 8) Examples > Could you add an example that consists of all the statements to show how to > manage the full lifecycle of queries? > Including show queries, create savepoint, remove savepoint, stop query with a > savepoint, and restart query with savepoint. Agreed. Adding it to the FLIP as well. Best, Paul Lam > 2022年5月7日 18:22,Jark Wu <imj...@gmail.com> 写道: > > Hi Paul, > > I think this FLIP has already in a good shape. I just left some additional > thoughts: > > 1) the display of savepoint_path > Could the displayed savepoint_path include the scheme part? > E.g. `hdfs:///flink-savepoints/savepoint-cca7bc-bb1e257f0dab` > IIUC, the scheme part is omitted when it's a local filesystem. > But the behavior would be clearer if including the scheme part in the design > doc. > > 2) Please make a decision on multiple options in the FLIP. > It might give the impression that we will support all the options. > > 3) +1 SAVEPOINT and RELEASE SAVEPOINT > Personally, I also prefer "SAVEPOINT <query_id>" and "RELEASE SAVEPOINT > <savepoint_path>" > to "CREATE/DROP SAVEPOINT", as they have been used in mature databases. > > 4) +1 SHOW QUERIES > Btw, the displayed column "address" is a little confusing to me. > At the first glance, I'm not sure what address it is, JM RPC address? JM REST > address? Gateway address? > If this is a link to the job's web UI URL, how about calling it "web_url" and > display in > "http://<hostname>:<port>" format? > Besides, how about displaying "startTime" or "uptime" as well? > > 5) STOP/CANCEL QUERY vs DROP QUERY > I'm +1 to DROP, because it's more compliant with SQL standard naming, i.e., > "SHOW/CREATE/DROP". > Separating STOP and CANCEL confuses users a lot what are the differences > between them. > I'm +1 to add the "PURGE" keyword to the DROP QUERY statement, which > indicates to stop query without savepoint. > Note that, PURGE doesn't mean stop with --drain flag. The drain flag will > flush all the registered timers > and windows which could lead to incorrect results when the job is resumed. I > think the drain flag is rarely used > (please correct me if I'm wrong), therefore, I suggest moving this feature > into future work when the needs are clear. > > 6) Table API > I think it makes sense to support the new statements in Table API. > We should try to make the Gateway and CLI simple which just forward statement > to the underlying TableEnvironemnt. > JAR statements are being re-implemented in Table API as well, see FLIP-214[1]. > > 7) <query_id> and <savepoint_path> should be quoted > All the <query_id> and <savepoint_path> should be string literal, otherwise > it's hard to parse them. > For example, STOP QUERY '<query_id>'. > > 8) Examples > Could you add an example that consists of all the statements to show how to > manage the full lifecycle of queries? > Including show queries, create savepoint, remove savepoint, stop query with a > savepoint, and restart query with savepoint. > > Best, > Jark > > [1]: > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL?src=contextnavpagetreemode > > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL?src=contextnavpagetreemode> > > > On Fri, 6 May 2022 at 19:13, Martijn Visser <martijnvis...@apache.org > <mailto:martijnvis...@apache.org>> wrote: > Hi Paul, > > Great that you could find something in the SQL standard! I'll try to read the > FLIP once more completely next week to see if I have any more concerns. > > Best regards, > > Martijn > > On Fri, 6 May 2022 at 08:21, Paul Lam <paullin3...@gmail.com > <mailto:paullin3...@gmail.com>> wrote: > I had a look at SQL-2016 that Martijn mentioned, and found that > maybe we could follow the transaction savepoint syntax. > SAVEPOINT <savepoint specifier> > RELEASE SAVEPOINT <savepoint specifier> > These savepoint statements are supported in lots of databases, like > Oracle[1], PG[2], MariaDB[3]. > > They’re usually used in the middle of a SQL transaction, so the target > would be the current transaction. But if used in Flink SQL session, we > need to add a JOB/QUERY id when create a savepoint, thus the syntax > would be: > SAVEPOINT <job/query id> <savepoint path> > RELEASE SAVEPOINT <savepoint path> > I’m adding it as an alternative in the FLIP. > > [1] > https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10001.htm > <https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10001.htm> > [2] https://www.postgresql.org/docs/current/sql-savepoint.html > <https://www.postgresql.org/docs/current/sql-savepoint.html> > [3] https://mariadb.com/kb/en/savepoint/ > <https://mariadb.com/kb/en/savepoint/> > > Best, > Paul Lam > >> 2022年5月4日 16:42,Paul Lam <paullin3...@gmail.com >> <mailto:paullin3...@gmail.com>> 写道: >> >> Hi Shengkai, >> >> Thanks a lot for your input! >> >> > I just wonder how the users can get the web ui in the application mode. >> Therefore, it's better we can list the Web UI using the SHOW statement. >> WDYT? >> >> I think it's a valid approach. I'm adding it to the FLIP. >> >> > After the investigation, I am fine with the QUERY but the keyword JOB is >> also okay to me. >> >> In addition, CockroachDB has both SHOW QUERIES [1] and SHOW JOBS [2], >> while the former shows the active running queries and the latter shows the >> background tasks like schema changes. FYI. >> >> WRT the questions: >> >> > 1. Could you add some details about the behaviour with the different >> execution.target, e.g. session, application mode? >> >> IMHO, the difference between different `execution.target` is mostly about >> cluster startup, which has little relation with the proposed statements. >> These statements rely on the current ClusterClient/JobClient API, >> which is deployment mode agnostic. Canceling a job in an application >> cluster is the same as in a session cluster. >> >> BTW, application mode is still in the development progress ATM [3]. >> >> > 2. Considering the SQL Client/Gateway is not limited to submitting the job >> to the specified cluster, is it able to list jobs in the other clusters? >> >> I think multi-cluster support in SQL Client/Gateway should be aligned with >> CLI, at least at the early phase. We may use SET to set a cluster id for a >> session, then we have access to the cluster. However, every SHOW >> statement would only involve one cluster. >> >> Best, >> Paul Lam >> >> [1] https://www.cockroachlabs.com/docs/stable/show-statements.html >> <https://www.cockroachlabs.com/docs/stable/show-statements.html> >> [2] https://www.cockroachlabs.com/docs/v21.2/show-jobs >> <https://www.cockroachlabs.com/docs/v21.2/show-jobs> >> [3] https://issues.apache.org/jira/browse/FLINK-26541 >> <https://issues.apache.org/jira/browse/FLINK-26541> >> Shengkai Fang <fskm...@gmail.com <mailto:fskm...@gmail.com>> 于2022年4月29日周五 >> 15:36写道: >> Hi. >> >> Thanks for Paul's update. >> >> > It's better we can also get the infos about the cluster where the job is >> > running through the DESCRIBE statement. >> >> I just wonder how the users can get the web ui in the application mode. >> Therefore, it's better we can list the Web UI using the SHOW statement. >> WDYT? >> >> >> > QUERY or other keywords. >> >> I list the statement to manage the lifecycle of the query/dml in other >> systems: >> >> Mysql[1] allows users to SHOW [FULL] PROCESSLIST and use the KILL command >> to kill the query. >> >> ``` >> mysql> SHOW PROCESSLIST; >> >> mysql> KILL 27; >> ``` >> >> >> Postgres use the following statements to kill the queries. >> >> ``` >> SELECT pg_cancel_backend(<pid of the process>) >> >> SELECT pg_terminate_backend(<pid of the process>) >> ``` >> >> KSQL uses the following commands to control the query lifecycle[4]. >> >> ``` >> SHOW QUERIES; >> >> TERMINATE <query id>; >> >> ``` >> >> [1] https://dev.mysql.com/doc/refman/8.0/en/show-processlist.html >> <https://dev.mysql.com/doc/refman/8.0/en/show-processlist.html> >> [2] https://scaledynamix.com/blog/how-to-kill-mysql-queries/ >> <https://scaledynamix.com/blog/how-to-kill-mysql-queries/> >> [3] >> https://stackoverflow.com/questions/35319597/how-to-stop-kill-a-query-in-postgresql >> >> <https://stackoverflow.com/questions/35319597/how-to-stop-kill-a-query-in-postgresql> >> [4] >> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/ >> >> <https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/> >> [5] >> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/terminate/ >> <https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/terminate/> >> >> After the investigation, I am fine with the QUERY but the keyword JOB is >> also okay to me. >> >> We also have two questions here. >> >> 1. Could you add some details about the behaviour with the different >> execution.target, e.g. session, application mode? >> >> 2. Considering the SQL Client/Gateway is not limited to submitting the job >> to the specified cluster, is it able to list jobs in the other clusters? >> >> >> Best, >> Shengkai >> >> Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com>> >> 于2022年4月28日周四 17:17写道: >> >> > Hi Martjin, >> > >> > Thanks a lot for your reply! I agree that the scope may be a bit confusing, >> > please let me clarify. >> > >> > The FLIP aims to add new SQL statements that are supported only in >> > sql-client, similar to >> > jar statements [1]. Jar statements can be parsed into jar operations, which >> > are used only in >> > CliClient in sql-client module and cannot be executed by TableEnvironment >> > (not available in >> > Table API program that contains SQL that you mentioned). >> > >> > WRT the unchanged CLI client, I mean CliClient instead of the sql-client >> > module, which >> > currently contains the gateway codes (e.g. Executor). The FLIP mainly >> > extends >> > the gateway part, and barely touches CliClient and REST server (REST >> > endpoint in FLIP-91). >> > >> > WRT the syntax, I don't have much experience with SQL standards, and I'd >> > like to hear >> > more opinions from the community. I prefer Hive-style syntax because I >> > think many users >> > are familiar with Hive, and there're on-going efforts to improve Flink-Hive >> > integration [2][3]. >> > But my preference is not strong, I'm okay with other options too. Do you >> > think JOB/Task is >> > a good choice, or do you have other preferred keywords? >> > >> > [1] >> > >> > https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/jar/ >> > >> > <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/jar/> >> > [2] >> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility >> > >> > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility> >> > [3] >> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-223%3A+Support+HiveServer2+Endpoint >> > >> > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-223%3A+Support+HiveServer2+Endpoint> >> > >> > Best, >> > Paul Lam >> > >> > Martijn Visser <martijnvis...@apache.org >> > <mailto:martijnvis...@apache.org>> 于2022年4月26日周二 20:14写道: >> > >> > > Hi Paul, >> > > >> > > Thanks for creating the FLIP and opening the discussion. I did get a bit >> > > confused about the title, being "query lifecycle statements in SQL >> > client". >> > > This sounds like you want to adopt the SQL client, but you want to expand >> > > the SQL syntax with lifecycle statements, which could be used from the >> > SQL >> > > client, but of course also in a Table API program that contains SQL. >> > GIven >> > > that you're highlighting the CLI client as unchanged, this adds to more >> > > confusion. >> > > >> > > I am interested if there's anything listed in the SQL 2016 standard on >> > > these types of lifecycle statements. I did a quick scan for "SHOW >> > QUERIES" >> > > but couldn't find it. It would be great if we could stay as close as >> > > possible to such syntax. Overall I'm not in favour of using QUERIES as a >> > > keyword. I think Flink applications are not queries, but short- or long >> > > running applications. Why should we follow Hive's setup and indeed not >> > > others such as Snowflake, but also Postgres or MySQL? >> > > >> > > Best regards, >> > > >> > > Martijn Visser >> > > https://twitter.com/MartijnVisser82 <https://twitter.com/MartijnVisser82> >> > > https://github.com/MartijnVisser <https://github.com/MartijnVisser> >> > > >> > > >> > > On Fri, 22 Apr 2022 at 12:06, Paul Lam <paullin3...@gmail.com >> > > <mailto:paullin3...@gmail.com>> wrote: >> > > >> > > > Hi Shengkai, >> > > > >> > > > Thanks a lot for your opinions! >> > > > >> > > > > 1. I think the keyword QUERY may confuse users because the statement >> > > also >> > > > > works for the DML statement. >> > > > >> > > > I slightly lean to QUERY, because: >> > > > >> > > > Hive calls DMLs queries. We could be better aligned with Hive using >> > > QUERY, >> > > > especially given that we plan to introduce Hive endpoint. >> > > > QUERY is a more SQL-like concept and friendly to SQL users. >> > > > >> > > > In general, my preference: QUERY > JOB > TASK. I’m okay with JOB, but >> > not >> > > > very good with TASK, as it conflicts with the task concept in Flink >> > > runtime. >> > > > >> > > > We could wait for more feedbacks from the community. >> > > > >> > > > > 2. STOP/CANCEL is not very straightforward for the SQL users to >> > > terminate >> > > > > their jobs. >> > > > >> > > > Agreed. I’m okay with DROP. And if we want to align with Hive, KILL >> > might >> > > > an alternative. >> > > > >> > > > > 3. I think CREATE/DROP SAVEPOINTS statement is more SQL-like. >> > > > >> > > > Agreed. It’s more SQL-like and intuitive. I’m updating the syntax on >> > the >> > > > FLIP. >> > > > >> > > > > 4. SHOW TASKS can just list the job id and use the DESCRIPE to get >> > more >> > > > > detailed job infos. >> > > > >> > > > That is a more SQL-like approach I think. But considering the >> > > > ClusterClient APIs, we can fetch the names and the status along in one >> > > > request, >> > > > thus it may be more user friendly to return them all in the SHOW >> > > > statement? >> > > > >> > > > > It's better we can also get the infos about the cluster where the job >> > > is >> > > > > running on through the DESCRIBE statement. >> > > > >> > > > I think cluster info could be part of session properties instead. WDYT? >> > > > >> > > > Best, >> > > > Paul Lam >> > > > >> > > > > 2022年4月22日 11:14,Shengkai Fang <fskm...@gmail.com >> > > > > <mailto:fskm...@gmail.com>> 写道: >> > > > > >> > > > > Hi Paul >> > > > > >> > > > > Sorry for the late response. I propose my thoughts here. >> > > > > >> > > > > 1. I think the keyword QUERY may confuse users because the statement >> > > also >> > > > > works for the DML statement. I find the Snowflakes[1] supports >> > > > > >> > > > > - CREATE TASK >> > > > > - DROP TASK >> > > > > - ALTER TASK >> > > > > - SHOW TASKS >> > > > > - DESCRIPE TASK >> > > > > >> > > > > I think we can follow snowflake to use `TASK` as the keyword or use >> > the >> > > > > keyword `JOB`? >> > > > > >> > > > > 2. STOP/CANCEL is not very straightforward for the SQL users to >> > > terminate >> > > > > their jobs. >> > > > > >> > > > > ``` >> > > > > DROP TASK [IF EXISTS] <job id> PURGE; -- Forcely stop the job with >> > > drain >> > > > > >> > > > > DROP TASK [IF EXISTS] <job id>; -- Stop the task with savepoints >> > > > > ``` >> > > > > >> > > > > Oracle[2] uses the PURGE to clean up the table and users can't not >> > > > recover. >> > > > > I think it also works for us to terminate the job permanently. >> > > > > >> > > > > 3. I think CREATE/DROP SAVEPOINTS statement is more SQL-like. Users >> > can >> > > > use >> > > > > the >> > > > > >> > > > > ``` >> > > > > SET 'state.savepoints.dir' = '<path_to_savepoint>'; >> > > > > SET 'state.savepoints.fomat' = 'native'; >> > > > > CREATE SAVEPOINT <job id>; >> > > > > >> > > > > DROP SAVEPOINT <path_to_savepoint>; >> > > > > ``` >> > > > > >> > > > > 4. SHOW TASKS can just list the job id and use the DESCRIPE to get >> > more >> > > > > detailed job infos. >> > > > > >> > > > > ``` >> > > > > >> > > > > SHOW TASKS; >> > > > > >> > > > > >> > > > > +----------------------------------+ >> > > > > | job_id | >> > > > > +----------------------------------+ >> > > > > | 0f6413c33757fbe0277897dd94485f04 | >> > > > > +----------------------------------+ >> > > > > >> > > > > DESCRIPE TASK <job id>; >> > > > > >> > > > > +------------------------ >> > > > > | job name | status | >> > > > > +------------------------ >> > > > > | insert-sink | running | >> > > > > +------------------------ >> > > > > >> > > > > ``` >> > > > > It's better we can also get the infos about the cluster where the job >> > > is >> > > > > running on through the DESCRIBE statement. >> > > > > >> > > > > >> > > > > [1] >> > > > > >> > > > >> > > >> > https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management >> > >> > <https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management> >> > > > < >> > > > >> > > >> > https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management >> > >> > <https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management> >> > > > > >> > > > > [2] >> > > > > >> > > > >> > > >> > https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806 >> > >> > <https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806> >> > > > < >> > > > >> > > >> > https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806 >> > >> > <https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806> >> > > > > >> > > > > >> > > > > Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com> >> > > > > <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>> >> > > > 于2022年4月21日周四 10:36写道: >> > > > > >> > > > >> ping @Timo @Jark @Shengkai >> > > > >> >> > > > >> Best, >> > > > >> Paul Lam >> > > > >> >> > > > >>> 2022年4月18日 17:12,Paul Lam <paullin3...@gmail.com >> > > > >>> <mailto:paullin3...@gmail.com>> 写道: >> > > > >>> >> > > > >>> Hi team, >> > > > >>> >> > > > >>> I’d like to start a discussion about FLIP-222 [1], which adds query >> > > > >> lifecycle >> > > > >>> statements to SQL client. >> > > > >>> >> > > > >>> Currently, SQL client supports submitting queries (queries in a >> > broad >> > > > >> sense, >> > > > >>> including DQLs and DMLs) but no further lifecycle statements, like >> > > > >> canceling >> > > > >>> a query or triggering a savepoint. That makes SQL users have to >> > rely >> > > on >> > > > >>> CLI or REST API to manage theirs queries. >> > > > >>> >> > > > >>> Thus, I propose to introduce the following statements to fill the >> > > gap. >> > > > >>> SHOW QUERIES >> > > > >>> STOP QUERY <query_id> >> > > > >>> CANCEL QUERY <query_id> >> > > > >>> TRIGGER SAVEPOINT <savepoint_path> >> > > > >>> DISPOSE SAVEPOINT <savepoint_path> >> > > > >>> These statement would align SQL client with CLI, providing the full >> > > > >> lifecycle >> > > > >>> management for queries/jobs. >> > > > >>> >> > > > >>> Please see the FLIP page[1] for more details. Thanks a lot! >> > > > >>> (For reference, the previous discussion thread see [2].) >> > > > >>> >> > > > >>> [1] >> > > > >> >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-222%3A+Support+full+query+lifecycle+statements+in+SQL+client >> > >> > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-222%3A+Support+full+query+lifecycle+statements+in+SQL+client> >> > > > >> < >> > > > >> >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client >> > >> > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client> >> > > > < >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client >> > >> > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client> >> > > > > >> > > > >>> >> > > > >>> [2] >> > https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb >> > <https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb> >> > > < >> > > > https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb >> > > > <https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb>> < >> > > > >> https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb >> > > > >> <https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb> < >> > > > https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb >> > > > <https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb>>> >> > > > >>> >> > > > >>> Best, >> > > > >>> Paul Lam >> > > > >> > > > >> > > >> > >