Hi Martijn, I think the `DROP SAVEPOINT` statement would not conflict with NO_CLAIM mode, since the statement is triggered by users instead of Flink runtime.
We’re simply providing a tool for user to cleanup the savepoints, just like `bin/flink savepoint -d :savepointPath` in Flink CLI [1]. [1] https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/savepoints/#disposing-savepoints Best, Paul Lam > 2022年6月9日 15:41,Martijn Visser <martijnvis...@apache.org> 写道: > > Hi all, > > I would not include a DROP SAVEPOINT syntax. With the recently introduced > CLAIM/NO CLAIM mode, I would argue that we've just clarified snapshot > ownership and if you have a savepoint established "with NO_CLAIM it creates > its own copy and leaves the existing one up to the user." [1] We shouldn't > then again make it fuzzy by making it possible that Flink can remove > snapshots. > > Best regards, > > Martijn > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership > > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership> > Op do 9 jun. 2022 om 09:27 schreef Paul Lam <paullin3...@gmail.com > <mailto:paullin3...@gmail.com>>: > Hi team, > > It's great to see our opinions are finally converging! > >> `STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] ` > > > LGTM. Adding it to the FLIP. > > To Jark, > >> We can simplify the statement to "CREATE SAVEPOINT FOR JOB <job_id>” > > Good point. The default savepoint dir should be enough for most cases. > > To Jing, > >> DROP SAVEPOINT ALL > > I think it’s valid to have such a statement, but I have two concerns: > `ALL` is already an SQL keyword, thus it may cause ambiguity. > Flink CLI and REST API doesn’t provided the corresponding functionalities, > and we’d better keep them aligned. > How about making this statement as follow-up tasks which should touch REST > API and Flink CLI? > > Best, > Paul Lam > >> 2022年6月9日 11:53,godfrey he <godfre...@gmail.com >> <mailto:godfre...@gmail.com>> 写道: >> >> Hi all, >> >> Regarding `PIPELINE`, it comes from flink-core module, see >> `PipelineOptions` class for more details. >> `JOBS` is a more generic concept than `PIPELINES`. I'm also be fine with >> `JOBS`. >> >> +1 to discuss JOBTREE in other FLIP. >> >> +1 to `STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] ` >> >> +1 to `CREATE SAVEPOINT FOR JOB <job_id>` and `DROP SAVEPOINT >> <savepoint_path>` >> >> Best, >> Godfrey >> >> Jing Ge <j...@ververica.com <mailto:j...@ververica.com>> 于2022年6月9日周四 >> 01:48写道: >>> >>> Hi Paul, Hi Jark, >>> >>> Re JOBTREE, agree that it is out of the scope of this FLIP >>> >>> Re `RELEASE SAVEPOINT ALL', if the community prefers 'DROP' then 'DROP >>> SAVEPOINT ALL' housekeeping. WDYT? >>> >>> Best regards, >>> Jing >>> >>> >>> On Wed, Jun 8, 2022 at 2:54 PM Jark Wu <imj...@gmail.com >>> <mailto:imj...@gmail.com>> wrote: >>>> >>>> Hi Jing, >>>> >>>> Regarding JOBTREE (job lineage), I agree with Paul that this is out of the >>>> scope >>>> of this FLIP and can be discussed in another FLIP. >>>> >>>> Job lineage is a big topic that may involve many problems: >>>> 1) how to collect and report job entities, attributes, and lineages? >>>> 2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub? >>>> 3) how does Flink SQL CLI/Gateway know the lineage information and show >>>> jobtree? >>>> 4) ... >>>> >>>> Best, >>>> Jark >>>> >>>> On Wed, 8 Jun 2022 at 20:44, Jark Wu <imj...@gmail.com >>>> <mailto:imj...@gmail.com>> wrote: >>>>> >>>>> Hi Paul, >>>>> >>>>> I'm fine with using JOBS. The only concern is that this may conflict with >>>>> displaying more detailed >>>>> information for query (e.g. query content, plan) in the future, e.g. SHOW >>>>> QUERIES EXTENDED in ksqldb[1]. >>>>> This is not a big problem as we can introduce SHOW QUERIES in the future >>>>> if necessary. >>>>> >>>>>> STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and >>>>>> `table.job.stop-with-drain`) >>>>> What about STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] ? >>>>> It might be trivial and error-prone to set configuration before executing >>>>> a statement, >>>>> and the configuration will affect all statements after that. >>>>> >>>>>> CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id> >>>>> We can simplify the statement to "CREATE SAVEPOINT FOR JOB <job_id>", >>>>> and always use configuration "state.savepoints.dir" as the default >>>>> savepoint dir. >>>>> The concern with using "<savepoint_path>" is here should be savepoint dir, >>>>> and savepoint_path is the returned value. >>>>> >>>>> I'm fine with other changes. >>>>> >>>>> Thanks, >>>>> Jark >>>>> >>>>> [1]: >>>>> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/ >>>>> >>>>> <https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/> >>>>> >>>>> >>>>> >>>>> On Wed, 8 Jun 2022 at 15:07, Paul Lam <paullin3...@gmail.com >>>>> <mailto:paullin3...@gmail.com>> wrote: >>>>>> >>>>>> Hi Jing, >>>>>> >>>>>> Thank you for your inputs! >>>>>> >>>>>> TBH, I haven’t considered the ETL scenario that you mentioned. I think >>>>>> they’re managed just like other jobs interns of job lifecycles (please >>>>>> correct me if I’m wrong). >>>>>> >>>>>> WRT to the SQL statements about SQL lineages, I think it might be a >>>>>> little bit out of the scope of the FLIP, since it’s mainly about >>>>>> lifecycles. By the way, do we have these functionalities in Flink CLI or >>>>>> REST API already? >>>>>> >>>>>> WRT `RELEASE SAVEPOINT ALL`, I’m sorry for the deprecated FLIP docs, the >>>>>> community is more in favor of `DROP SAVEPOINT <savepoint_path>`. I’m >>>>>> updating the FLIP arcading to the latest discussions. >>>>>> >>>>>> Best, >>>>>> Paul Lam >>>>>> >>>>>> 2022年6月8日 07:31,Jing Ge <j...@ververica.com <mailto:j...@ververica.com>> >>>>>> 写道: >>>>>> >>>>>> Hi Paul, >>>>>> >>>>>> Sorry that I am a little bit too late to join this thread. Thanks for >>>>>> driving this and starting this informative discussion. The FLIP looks >>>>>> really interesting. It will help us a lot to manage Flink SQL jobs. >>>>>> >>>>>> Have you considered the ETL scenario with Flink SQL, where multiple SQLs >>>>>> build a DAG for many DAGs? >>>>>> >>>>>> 1) >>>>>> +1 for SHOW JOBS. I think sooner or later we will start to discuss how >>>>>> to support ETL jobs. Briefly speaking, SQLs that used to build the DAG >>>>>> are responsible to *produce* data as the result(cube, materialized view, >>>>>> etc.) for the future consumption by queries. The INSERT INTO SELECT FROM >>>>>> example in FLIP and CTAS are typical SQL in this case. I would prefer to >>>>>> call them Jobs instead of Queries. >>>>>> >>>>>> 2) >>>>>> Speaking of ETL DAG, we might want to see the lineage. Is it possible to >>>>>> support syntax like: >>>>>> >>>>>> SHOW JOBTREE <job_id> // shows the downstream DAG from the given job_id >>>>>> SHOW JOBTREE <job_id> FULL // shows the whole DAG that contains the >>>>>> given job_id >>>>>> SHOW JOBTREES // shows all DAGs >>>>>> SHOW ANCIENTS <job_id> // shows all parents of the given job_id >>>>>> >>>>>> 3) >>>>>> Could we also support Savepoint housekeeping syntax? We ran into this >>>>>> issue that a lot of savepoints have been created by customers (via their >>>>>> apps). It will take extra (hacking) effort to clean it. >>>>>> >>>>>> RELEASE SAVEPOINT ALL >>>>>> >>>>>> Best regards, >>>>>> Jing >>>>>> >>>>>> On Tue, Jun 7, 2022 at 2:35 PM Martijn Visser <martijnvis...@apache.org >>>>>> <mailto:martijnvis...@apache.org>> wrote: >>>>>>> >>>>>>> Hi Paul, >>>>>>> >>>>>>> I'm still doubting the keyword for the SQL applications. SHOW QUERIES >>>>>>> could >>>>>>> imply that this will actually show the query, but we're returning IDs of >>>>>>> the running application. At first I was also not very much in favour of >>>>>>> SHOW JOBS since I prefer calling it 'Flink applications' and not 'Flink >>>>>>> jobs', but the glossary [1] made me reconsider. I would +1 SHOW/STOP >>>>>>> JOBS >>>>>>> >>>>>>> Also +1 for the CREATE/SHOW/DROP SAVEPOINT syntax. >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Martijn >>>>>>> >>>>>>> [1] >>>>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/glossary >>>>>>> >>>>>>> <https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/glossary> >>>>>>> >>>>>>> Op za 4 jun. 2022 om 10:38 schreef Paul Lam <paullin3...@gmail.com >>>>>>> <mailto:paullin3...@gmail.com>>: >>>>>>> >>>>>>>> Hi Godfrey, >>>>>>>> >>>>>>>> Sorry for the late reply, I was on vacation. >>>>>>>> >>>>>>>> It looks like we have a variety of preferences on the syntax, how >>>>>>>> about we >>>>>>>> choose the most acceptable one? >>>>>>>> >>>>>>>> WRT keyword for SQL jobs, we use JOBS, thus the statements related to >>>>>>>> jobs >>>>>>>> would be: >>>>>>>> >>>>>>>> - SHOW JOBS >>>>>>>> - STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and >>>>>>>> `table.job.stop-with-drain`) >>>>>>>> >>>>>>>> WRT savepoint for SQL jobs, we use the `CREATE/DROP` pattern with `FOR >>>>>>>> JOB`: >>>>>>>> >>>>>>>> - CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id> >>>>>>>> - SHOW SAVEPOINTS FOR JOB <job_id> (show savepoints the current job >>>>>>>> manager remembers) >>>>>>>> - DROP SAVEPOINT <savepoint_path> >>>>>>>> >>>>>>>> cc @Jark @ShengKai @Martijn @Timo . >>>>>>>> >>>>>>>> Best, >>>>>>>> Paul Lam >>>>>>>> >>>>>>>> >>>>>>>> godfrey he <godfre...@gmail.com <mailto:godfre...@gmail.com>> >>>>>>>> 于2022年5月23日周一 21:34写道: >>>>>>>> >>>>>>>>> Hi Paul, >>>>>>>>> >>>>>>>>> Thanks for the update. >>>>>>>>> >>>>>>>>>> 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs >>>>>>>>> (DataStream or SQL) or >>>>>>>>> clients (SQL client or CLI). >>>>>>>>> >>>>>>>>> Is DataStream job a QUERY? I think not. >>>>>>>>> For a QUERY, the most important concept is the statement. But the >>>>>>>>> result does not contain this info. >>>>>>>>> If we need to contain all jobs in the cluster, I think the name should >>>>>>>>> be JOB or PIPELINE. >>>>>>>>> I learn to SHOW PIPELINES and STOP PIPELINE [IF RUNNING] id. >>>>>>>>> >>>>>>>>>> SHOW SAVEPOINTS >>>>>>>>> To list the savepoint for a specific job, we need to specify a >>>>>>>>> specific pipeline, >>>>>>>>> the syntax should be SHOW SAVEPOINTS FOR PIPELINE id >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Godfrey >>>>>>>>> >>>>>>>>> Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com>> >>>>>>>>> 于2022年5月20日周五 11:25写道: >>>>>>>>>> >>>>>>>>>> Hi Jark, >>>>>>>>>> >>>>>>>>>> WRT “DROP QUERY”, I agree that it’s not very intuitive, and that’s >>>>>>>>>> part of the reason why I proposed “STOP/CANCEL QUERY” at the >>>>>>>>>> beginning. The downside of it is that it’s not ANSI-SQL compatible. >>>>>>>>>> >>>>>>>>>> Another question is, what should be the syntax for ungracefully >>>>>>>>>> canceling a query? As ShengKai pointed out in a offline discussion, >>>>>>>>>> “STOP QUERY” and “CANCEL QUERY” might confuse SQL users. >>>>>>>>>> Flink CLI has both stop and cancel, mostly due to historical >>>>>>>>>> problems. >>>>>>>>>> >>>>>>>>>> WRT “SHOW SAVEPOINT”, I agree it’s a missing part. My concern is >>>>>>>>>> that savepoints are owned by users and beyond the lifecycle of a >>>>>>>>>> Flink >>>>>>>>>> cluster. For example, a user might take a savepoint at a custom path >>>>>>>>>> that’s different than the default savepoint path, I think jobmanager >>>>>>>>> would >>>>>>>>>> not remember that, not to mention the jobmanager may be a fresh new >>>>>>>>>> one after a cluster restart. Thus if we support “SHOW SAVEPOINT”, >>>>>>>>>> it's >>>>>>>>>> probably a best-effort one. >>>>>>>>>> >>>>>>>>>> WRT savepoint syntax, I’m thinking of the semantic of the savepoint. >>>>>>>>>> Savepoints are alias for nested transactions in DB area[1], and >>>>>>>>>> there’s >>>>>>>>>> correspondingly global transactions. If we consider Flink jobs as >>>>>>>>>> global transactions and Flink checkpoints as nested transactions, >>>>>>>>>> then the savepoint semantics are close, thus I think savepoint syntax >>>>>>>>>> in SQL-standard could be considered. But again, I’m don’t have very >>>>>>>>>> strong preference. >>>>>>>>>> >>>>>>>>>> Ping @Timo to get more inputs. >>>>>>>>>> >>>>>>>>>> [1] https://en.wikipedia.org/wiki/Nested_transaction >>>>>>>>>> <https://en.wikipedia.org/wiki/Nested_transaction> < >>>>>>>>> https://en.wikipedia.org/wiki/Nested_transaction >>>>>>>>> <https://en.wikipedia.org/wiki/Nested_transaction>> >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Paul Lam >>>>>>>>>> >>>>>>>>>>> 2022年5月18日 17:48,Jark Wu <imj...@gmail.com >>>>>>>>>>> <mailto:imj...@gmail.com>> 写道: >>>>>>>>>>> >>>>>>>>>>> Hi Paul, >>>>>>>>>>> >>>>>>>>>>> 1) SHOW QUERIES >>>>>>>>>>> +1 to add finished time, but it would be better to call it >>>>>>>>>>> "end_time" >>>>>>>>> to >>>>>>>>>>> keep aligned with names in Web UI. >>>>>>>>>>> >>>>>>>>>>> 2) DROP QUERY >>>>>>>>>>> I think we shouldn't throw exceptions for batch jobs, otherwise, how >>>>>>>>> to >>>>>>>>>>> stop batch queries? >>>>>>>>>>> At present, I don't think "DROP" is a suitable keyword for this >>>>>>>>> statement. >>>>>>>>>>> From the perspective of users, "DROP" sounds like the query should >>>>>>>>>>> be >>>>>>>>>>> removed from the >>>>>>>>>>> list of "SHOW QUERIES". However, it doesn't. Maybe "STOP QUERY" is >>>>>>>>> more >>>>>>>>>>> suitable and >>>>>>>>>>> compliant with commands of Flink CLI. >>>>>>>>>>> >>>>>>>>>>> 3) SHOW SAVEPOINTS >>>>>>>>>>> I think this statement is needed, otherwise, savepoints are lost >>>>>>>>> after the >>>>>>>>>>> SAVEPOINT >>>>>>>>>>> command is executed. Savepoints can be retrieved from REST API >>>>>>>>>>> "/jobs/:jobid/checkpoints" >>>>>>>>>>> with filtering "checkpoint_type"="savepoint". It's also worth >>>>>>>>> considering >>>>>>>>>>> providing "SHOW CHECKPOINTS" >>>>>>>>>>> to list all checkpoints. >>>>>>>>>>> >>>>>>>>>>> 4) SAVEPOINT & RELEASE SAVEPOINT >>>>>>>>>>> I'm a little concerned with the SAVEPOINT and RELEASE SAVEPOINT >>>>>>>>> statements >>>>>>>>>>> now. >>>>>>>>>>> In the vendors, the parameters of SAVEPOINT and RELEASE SAVEPOINT >>>>>>>>>>> are >>>>>>>>> both >>>>>>>>>>> the same savepoint id. >>>>>>>>>>> However, in our syntax, the first one is query id, and the second >>>>>>>>>>> one >>>>>>>>> is >>>>>>>>>>> savepoint path, which is confusing and >>>>>>>>>>> not consistent. When I came across SHOW SAVEPOINT, I thought maybe >>>>>>>>> they >>>>>>>>>>> should be in the same syntax set. >>>>>>>>>>> For example, CREATE SAVEPOINT FOR [QUERY] <query_id> & DROP >>>>>>>>>>> SAVEPOINT >>>>>>>>>>> <sp_path>. >>>>>>>>>>> That means we don't follow the majority of vendors in SAVEPOINT >>>>>>>>> commands. I >>>>>>>>>>> would say the purpose is different in Flink. >>>>>>>>>>> What other's opinion on this? >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Jark >>>>>>>>>>> >>>>>>>>>>> [1]: >>>>>>>>>>> >>>>>>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-checkpoints >>>>>>>>> >>>>>>>>> <https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-checkpoints> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, 18 May 2022 at 14:43, Paul Lam <paullin3...@gmail.com >>>>>>>>>>> <mailto:paullin3...@gmail.com>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Godfrey, >>>>>>>>>>>> >>>>>>>>>>>> Thanks a lot for your inputs! >>>>>>>>>>>> >>>>>>>>>>>> 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs >>>>>>>>> (DataStream >>>>>>>>>>>> or SQL) or >>>>>>>>>>>> clients (SQL client or CLI). Under the hook, it’s based on >>>>>>>>>>>> ClusterClient#listJobs, the >>>>>>>>>>>> same with Flink CLI. I think it’s okay to have non-SQL jobs listed >>>>>>>>> in SQL >>>>>>>>>>>> client, because >>>>>>>>>>>> these jobs can be managed via SQL client too. >>>>>>>>>>>> >>>>>>>>>>>> WRT finished time, I think you’re right. Adding it to the FLIP. But >>>>>>>>> I’m a >>>>>>>>>>>> bit afraid that the >>>>>>>>>>>> rows would be too long. >>>>>>>>>>>> >>>>>>>>>>>> WRT ‘DROP QUERY’, >>>>>>>>>>>>> What's the behavior for batch jobs and the non-running jobs? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> In general, the behavior would be aligned with Flink CLI. >>>>>>>>>>>> Triggering >>>>>>>>> a >>>>>>>>>>>> savepoint for >>>>>>>>>>>> a non-running job would cause errors, and the error message would >>>>>>>>>>>> be >>>>>>>>>>>> printed to >>>>>>>>>>>> the SQL client. Triggering a savepoint for batch(unbounded) jobs in >>>>>>>>>>>> streaming >>>>>>>>>>>> execution mode would be the same with streaming jobs. However, for >>>>>>>>> batch >>>>>>>>>>>> jobs in >>>>>>>>>>>> batch execution mode, I think there would be an error, because >>>>>>>>>>>> batch >>>>>>>>>>>> execution >>>>>>>>>>>> doesn’t support checkpoints currently (please correct me if I’m >>>>>>>>> wrong). >>>>>>>>>>>> >>>>>>>>>>>> WRT ’SHOW SAVEPOINTS’, I’ve thought about it, but Flink >>>>>>>>> clusterClient/ >>>>>>>>>>>> jobClient doesn’t have such a functionality at the moment, neither >>>>>>>>>>>> do >>>>>>>>>>>> Flink CLI. >>>>>>>>>>>> Maybe we could make it a follow-up FLIP, which includes the >>>>>>>>> modifications >>>>>>>>>>>> to >>>>>>>>>>>> clusterClient/jobClient and Flink CLI. WDYT? >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Paul Lam >>>>>>>>>>>> >>>>>>>>>>>>> 2022年5月17日 20:34,godfrey he <godfre...@gmail.com >>>>>>>>>>>>> <mailto:godfre...@gmail.com>> 写道: >>>>>>>>>>>>> >>>>>>>>>>>>> Godfrey >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>>> >