Hi Martjin, Thanks a lot for your reply! I agree that the scope may be a bit confusing, please let me clarify.
The FLIP aims to add new SQL statements that are supported only in sql-client, similar to jar statements [1]. Jar statements can be parsed into jar operations, which are used only in CliClient in sql-client module and cannot be executed by TableEnvironment (not available in Table API program that contains SQL that you mentioned). WRT the unchanged CLI client, I mean CliClient instead of the sql-client module, which currently contains the gateway codes (e.g. Executor). The FLIP mainly extends the gateway part, and barely touches CliClient and REST server (REST endpoint in FLIP-91). WRT the syntax, I don't have much experience with SQL standards, and I'd like to hear more opinions from the community. I prefer Hive-style syntax because I think many users are familiar with Hive, and there're on-going efforts to improve Flink-Hive integration [2][3]. But my preference is not strong, I'm okay with other options too. Do you think JOB/Task is a good choice, or do you have other preferred keywords? [1] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/jar/ [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility [3] https://cwiki.apache.org/confluence/display/FLINK/FLIP-223%3A+Support+HiveServer2+Endpoint Best, Paul Lam Martijn Visser <martijnvis...@apache.org> 于2022年4月26日周二 20:14写道: > Hi Paul, > > Thanks for creating the FLIP and opening the discussion. I did get a bit > confused about the title, being "query lifecycle statements in SQL client". > This sounds like you want to adopt the SQL client, but you want to expand > the SQL syntax with lifecycle statements, which could be used from the SQL > client, but of course also in a Table API program that contains SQL. GIven > that you're highlighting the CLI client as unchanged, this adds to more > confusion. > > I am interested if there's anything listed in the SQL 2016 standard on > these types of lifecycle statements. I did a quick scan for "SHOW QUERIES" > but couldn't find it. It would be great if we could stay as close as > possible to such syntax. Overall I'm not in favour of using QUERIES as a > keyword. I think Flink applications are not queries, but short- or long > running applications. Why should we follow Hive's setup and indeed not > others such as Snowflake, but also Postgres or MySQL? > > Best regards, > > Martijn Visser > https://twitter.com/MartijnVisser82 > https://github.com/MartijnVisser > > > On Fri, 22 Apr 2022 at 12:06, Paul Lam <paullin3...@gmail.com> wrote: > > > Hi Shengkai, > > > > Thanks a lot for your opinions! > > > > > 1. I think the keyword QUERY may confuse users because the statement > also > > > works for the DML statement. > > > > I slightly lean to QUERY, because: > > > > Hive calls DMLs queries. We could be better aligned with Hive using > QUERY, > > especially given that we plan to introduce Hive endpoint. > > QUERY is a more SQL-like concept and friendly to SQL users. > > > > In general, my preference: QUERY > JOB > TASK. I’m okay with JOB, but not > > very good with TASK, as it conflicts with the task concept in Flink > runtime. > > > > We could wait for more feedbacks from the community. > > > > > 2. STOP/CANCEL is not very straightforward for the SQL users to > terminate > > > their jobs. > > > > Agreed. I’m okay with DROP. And if we want to align with Hive, KILL might > > an alternative. > > > > > 3. I think CREATE/DROP SAVEPOINTS statement is more SQL-like. > > > > Agreed. It’s more SQL-like and intuitive. I’m updating the syntax on the > > FLIP. > > > > > 4. SHOW TASKS can just list the job id and use the DESCRIPE to get more > > > detailed job infos. > > > > That is a more SQL-like approach I think. But considering the > > ClusterClient APIs, we can fetch the names and the status along in one > > request, > > thus it may be more user friendly to return them all in the SHOW > > statement? > > > > > It's better we can also get the infos about the cluster where the job > is > > > running on through the DESCRIBE statement. > > > > I think cluster info could be part of session properties instead. WDYT? > > > > Best, > > Paul Lam > > > > > 2022年4月22日 11:14,Shengkai Fang <fskm...@gmail.com> 写道: > > > > > > Hi Paul > > > > > > Sorry for the late response. I propose my thoughts here. > > > > > > 1. I think the keyword QUERY may confuse users because the statement > also > > > works for the DML statement. I find the Snowflakes[1] supports > > > > > > - CREATE TASK > > > - DROP TASK > > > - ALTER TASK > > > - SHOW TASKS > > > - DESCRIPE TASK > > > > > > I think we can follow snowflake to use `TASK` as the keyword or use the > > > keyword `JOB`? > > > > > > 2. STOP/CANCEL is not very straightforward for the SQL users to > terminate > > > their jobs. > > > > > > ``` > > > DROP TASK [IF EXISTS] <job id> PURGE; -- Forcely stop the job with > drain > > > > > > DROP TASK [IF EXISTS] <job id>; -- Stop the task with savepoints > > > ``` > > > > > > Oracle[2] uses the PURGE to clean up the table and users can't not > > recover. > > > I think it also works for us to terminate the job permanently. > > > > > > 3. I think CREATE/DROP SAVEPOINTS statement is more SQL-like. Users can > > use > > > the > > > > > > ``` > > > SET 'state.savepoints.dir' = '<path_to_savepoint>'; > > > SET 'state.savepoints.fomat' = 'native'; > > > CREATE SAVEPOINT <job id>; > > > > > > DROP SAVEPOINT <path_to_savepoint>; > > > ``` > > > > > > 4. SHOW TASKS can just list the job id and use the DESCRIPE to get more > > > detailed job infos. > > > > > > ``` > > > > > > SHOW TASKS; > > > > > > > > > +----------------------------------+ > > > | job_id | > > > +----------------------------------+ > > > | 0f6413c33757fbe0277897dd94485f04 | > > > +----------------------------------+ > > > > > > DESCRIPE TASK <job id>; > > > > > > +------------------------ > > > | job name | status | > > > +------------------------ > > > | insert-sink | running | > > > +------------------------ > > > > > > ``` > > > It's better we can also get the infos about the cluster where the job > is > > > running on through the DESCRIBE statement. > > > > > > > > > [1] > > > > > > https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management > > < > > > https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management > > > > > > [2] > > > > > > https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806 > > < > > > https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806 > > > > > > > > > Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com>> > > 于2022年4月21日周四 10:36写道: > > > > > >> ping @Timo @Jark @Shengkai > > >> > > >> Best, > > >> Paul Lam > > >> > > >>> 2022年4月18日 17:12,Paul Lam <paullin3...@gmail.com> 写道: > > >>> > > >>> Hi team, > > >>> > > >>> I’d like to start a discussion about FLIP-222 [1], which adds query > > >> lifecycle > > >>> statements to SQL client. > > >>> > > >>> Currently, SQL client supports submitting queries (queries in a broad > > >> sense, > > >>> including DQLs and DMLs) but no further lifecycle statements, like > > >> canceling > > >>> a query or triggering a savepoint. That makes SQL users have to rely > on > > >>> CLI or REST API to manage theirs queries. > > >>> > > >>> Thus, I propose to introduce the following statements to fill the > gap. > > >>> SHOW QUERIES > > >>> STOP QUERY <query_id> > > >>> CANCEL QUERY <query_id> > > >>> TRIGGER SAVEPOINT <savepoint_path> > > >>> DISPOSE SAVEPOINT <savepoint_path> > > >>> These statement would align SQL client with CLI, providing the full > > >> lifecycle > > >>> management for queries/jobs. > > >>> > > >>> Please see the FLIP page[1] for more details. Thanks a lot! > > >>> (For reference, the previous discussion thread see [2].) > > >>> > > >>> [1] > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-222%3A+Support+full+query+lifecycle+statements+in+SQL+client > > >> < > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client > > < > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client > > > > > >>> > > >>> [2] https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb > < > > https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb> < > > >> https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb < > > https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb>> > > >>> > > >>> Best, > > >>> Paul Lam > > > > >