Hi all, I would not include a DROP SAVEPOINT syntax. With the recently introduced CLAIM/NO CLAIM mode, I would argue that we've just clarified snapshot ownership and if you have a savepoint established "with NO_CLAIM it creates its own copy and leaves the existing one up to the user." [1] We shouldn't then again make it fuzzy by making it possible that Flink can remove snapshots.
Best regards, Martijn [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership Op do 9 jun. 2022 om 09:27 schreef Paul Lam <paullin3...@gmail.com>: > Hi team, > > It's great to see our opinions are finally converging! > > `STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] ` > > > LGTM. Adding it to the FLIP. > > To Jark, > > We can simplify the statement to "CREATE SAVEPOINT FOR JOB <job_id>” > > > Good point. The default savepoint dir should be enough for most cases. > > To Jing, > > DROP SAVEPOINT ALL > > > I think it’s valid to have such a statement, but I have two concerns: > > - `ALL` is already an SQL keyword, thus it may cause ambiguity. > - Flink CLI and REST API doesn’t provided the corresponding > functionalities, and we’d better keep them aligned. > > How about making this statement as follow-up tasks which should touch REST > API and Flink CLI? > > Best, > Paul Lam > > 2022年6月9日 11:53,godfrey he <godfre...@gmail.com> 写道: > > Hi all, > > Regarding `PIPELINE`, it comes from flink-core module, see > `PipelineOptions` class for more details. > `JOBS` is a more generic concept than `PIPELINES`. I'm also be fine with > `JOBS`. > > +1 to discuss JOBTREE in other FLIP. > > +1 to `STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] ` > > +1 to `CREATE SAVEPOINT FOR JOB <job_id>` and `DROP SAVEPOINT > <savepoint_path>` > > Best, > Godfrey > > Jing Ge <j...@ververica.com> 于2022年6月9日周四 01:48写道: > > > Hi Paul, Hi Jark, > > Re JOBTREE, agree that it is out of the scope of this FLIP > > Re `RELEASE SAVEPOINT ALL', if the community prefers 'DROP' then 'DROP > SAVEPOINT ALL' housekeeping. WDYT? > > Best regards, > Jing > > > On Wed, Jun 8, 2022 at 2:54 PM Jark Wu <imj...@gmail.com> wrote: > > > Hi Jing, > > Regarding JOBTREE (job lineage), I agree with Paul that this is out of the > scope > of this FLIP and can be discussed in another FLIP. > > Job lineage is a big topic that may involve many problems: > 1) how to collect and report job entities, attributes, and lineages? > 2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub? > 3) how does Flink SQL CLI/Gateway know the lineage information and show > jobtree? > 4) ... > > Best, > Jark > > On Wed, 8 Jun 2022 at 20:44, Jark Wu <imj...@gmail.com> wrote: > > > Hi Paul, > > I'm fine with using JOBS. The only concern is that this may conflict with > displaying more detailed > information for query (e.g. query content, plan) in the future, e.g. SHOW > QUERIES EXTENDED in ksqldb[1]. > This is not a big problem as we can introduce SHOW QUERIES in the future > if necessary. > > STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and > `table.job.stop-with-drain`) > > What about STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] ? > It might be trivial and error-prone to set configuration before executing > a statement, > and the configuration will affect all statements after that. > > CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id> > > We can simplify the statement to "CREATE SAVEPOINT FOR JOB <job_id>", > and always use configuration "state.savepoints.dir" as the default > savepoint dir. > The concern with using "<savepoint_path>" is here should be savepoint dir, > and savepoint_path is the returned value. > > I'm fine with other changes. > > Thanks, > Jark > > [1]: > https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/ > > > > On Wed, 8 Jun 2022 at 15:07, Paul Lam <paullin3...@gmail.com> wrote: > > > Hi Jing, > > Thank you for your inputs! > > TBH, I haven’t considered the ETL scenario that you mentioned. I think > they’re managed just like other jobs interns of job lifecycles (please > correct me if I’m wrong). > > WRT to the SQL statements about SQL lineages, I think it might be a little > bit out of the scope of the FLIP, since it’s mainly about lifecycles. By > the way, do we have these functionalities in Flink CLI or REST API already? > > WRT `RELEASE SAVEPOINT ALL`, I’m sorry for the deprecated FLIP docs, the > community is more in favor of `DROP SAVEPOINT <savepoint_path>`. I’m > updating the FLIP arcading to the latest discussions. > > Best, > Paul Lam > > 2022年6月8日 07:31,Jing Ge <j...@ververica.com> 写道: > > Hi Paul, > > Sorry that I am a little bit too late to join this thread. Thanks for > driving this and starting this informative discussion. The FLIP looks > really interesting. It will help us a lot to manage Flink SQL jobs. > > Have you considered the ETL scenario with Flink SQL, where multiple SQLs > build a DAG for many DAGs? > > 1) > +1 for SHOW JOBS. I think sooner or later we will start to discuss how to > support ETL jobs. Briefly speaking, SQLs that used to build the DAG are > responsible to *produce* data as the result(cube, materialized view, etc.) > for the future consumption by queries. The INSERT INTO SELECT FROM example > in FLIP and CTAS are typical SQL in this case. I would prefer to call them > Jobs instead of Queries. > > 2) > Speaking of ETL DAG, we might want to see the lineage. Is it possible to > support syntax like: > > SHOW JOBTREE <job_id> // shows the downstream DAG from the given job_id > SHOW JOBTREE <job_id> FULL // shows the whole DAG that contains the given > job_id > SHOW JOBTREES // shows all DAGs > SHOW ANCIENTS <job_id> // shows all parents of the given job_id > > 3) > Could we also support Savepoint housekeeping syntax? We ran into this > issue that a lot of savepoints have been created by customers (via their > apps). It will take extra (hacking) effort to clean it. > > RELEASE SAVEPOINT ALL > > Best regards, > Jing > > On Tue, Jun 7, 2022 at 2:35 PM Martijn Visser <martijnvis...@apache.org> > wrote: > > > Hi Paul, > > I'm still doubting the keyword for the SQL applications. SHOW QUERIES could > imply that this will actually show the query, but we're returning IDs of > the running application. At first I was also not very much in favour of > SHOW JOBS since I prefer calling it 'Flink applications' and not 'Flink > jobs', but the glossary [1] made me reconsider. I would +1 SHOW/STOP JOBS > > Also +1 for the CREATE/SHOW/DROP SAVEPOINT syntax. > > Best regards, > > Martijn > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/glossary > > Op za 4 jun. 2022 om 10:38 schreef Paul Lam <paullin3...@gmail.com>: > > Hi Godfrey, > > Sorry for the late reply, I was on vacation. > > It looks like we have a variety of preferences on the syntax, how about we > choose the most acceptable one? > > WRT keyword for SQL jobs, we use JOBS, thus the statements related to jobs > would be: > > - SHOW JOBS > - STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and > `table.job.stop-with-drain`) > > WRT savepoint for SQL jobs, we use the `CREATE/DROP` pattern with `FOR > JOB`: > > - CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id> > - SHOW SAVEPOINTS FOR JOB <job_id> (show savepoints the current job > manager remembers) > - DROP SAVEPOINT <savepoint_path> > > cc @Jark @ShengKai @Martijn @Timo . > > Best, > Paul Lam > > > godfrey he <godfre...@gmail.com> 于2022年5月23日周一 21:34写道: > > Hi Paul, > > Thanks for the update. > > 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs > > (DataStream or SQL) or > clients (SQL client or CLI). > > Is DataStream job a QUERY? I think not. > For a QUERY, the most important concept is the statement. But the > result does not contain this info. > If we need to contain all jobs in the cluster, I think the name should > be JOB or PIPELINE. > I learn to SHOW PIPELINES and STOP PIPELINE [IF RUNNING] id. > > SHOW SAVEPOINTS > > To list the savepoint for a specific job, we need to specify a > specific pipeline, > the syntax should be SHOW SAVEPOINTS FOR PIPELINE id > > Best, > Godfrey > > Paul Lam <paullin3...@gmail.com> 于2022年5月20日周五 11:25写道: > > > Hi Jark, > > WRT “DROP QUERY”, I agree that it’s not very intuitive, and that’s > part of the reason why I proposed “STOP/CANCEL QUERY” at the > beginning. The downside of it is that it’s not ANSI-SQL compatible. > > Another question is, what should be the syntax for ungracefully > canceling a query? As ShengKai pointed out in a offline discussion, > “STOP QUERY” and “CANCEL QUERY” might confuse SQL users. > Flink CLI has both stop and cancel, mostly due to historical problems. > > WRT “SHOW SAVEPOINT”, I agree it’s a missing part. My concern is > that savepoints are owned by users and beyond the lifecycle of a Flink > cluster. For example, a user might take a savepoint at a custom path > that’s different than the default savepoint path, I think jobmanager > > would > > not remember that, not to mention the jobmanager may be a fresh new > one after a cluster restart. Thus if we support “SHOW SAVEPOINT”, it's > probably a best-effort one. > > WRT savepoint syntax, I’m thinking of the semantic of the savepoint. > Savepoints are alias for nested transactions in DB area[1], and there’s > correspondingly global transactions. If we consider Flink jobs as > global transactions and Flink checkpoints as nested transactions, > then the savepoint semantics are close, thus I think savepoint syntax > in SQL-standard could be considered. But again, I’m don’t have very > strong preference. > > Ping @Timo to get more inputs. > > [1] https://en.wikipedia.org/wiki/Nested_transaction < > > https://en.wikipedia.org/wiki/Nested_transaction> > > > Best, > Paul Lam > > 2022年5月18日 17:48,Jark Wu <imj...@gmail.com> 写道: > > Hi Paul, > > 1) SHOW QUERIES > +1 to add finished time, but it would be better to call it "end_time" > > to > > keep aligned with names in Web UI. > > 2) DROP QUERY > I think we shouldn't throw exceptions for batch jobs, otherwise, how > > to > > stop batch queries? > At present, I don't think "DROP" is a suitable keyword for this > > statement. > > From the perspective of users, "DROP" sounds like the query should be > removed from the > list of "SHOW QUERIES". However, it doesn't. Maybe "STOP QUERY" is > > more > > suitable and > compliant with commands of Flink CLI. > > 3) SHOW SAVEPOINTS > I think this statement is needed, otherwise, savepoints are lost > > after the > > SAVEPOINT > command is executed. Savepoints can be retrieved from REST API > "/jobs/:jobid/checkpoints" > with filtering "checkpoint_type"="savepoint". It's also worth > > considering > > providing "SHOW CHECKPOINTS" > to list all checkpoints. > > 4) SAVEPOINT & RELEASE SAVEPOINT > I'm a little concerned with the SAVEPOINT and RELEASE SAVEPOINT > > statements > > now. > In the vendors, the parameters of SAVEPOINT and RELEASE SAVEPOINT are > > both > > the same savepoint id. > However, in our syntax, the first one is query id, and the second one > > is > > savepoint path, which is confusing and > not consistent. When I came across SHOW SAVEPOINT, I thought maybe > > they > > should be in the same syntax set. > For example, CREATE SAVEPOINT FOR [QUERY] <query_id> & DROP SAVEPOINT > <sp_path>. > That means we don't follow the majority of vendors in SAVEPOINT > > commands. I > > would say the purpose is different in Flink. > What other's opinion on this? > > Best, > Jark > > [1]: > > > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-checkpoints > > > > On Wed, 18 May 2022 at 14:43, Paul Lam <paullin3...@gmail.com> wrote: > > Hi Godfrey, > > Thanks a lot for your inputs! > > 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs > > (DataStream > > or SQL) or > clients (SQL client or CLI). Under the hook, it’s based on > ClusterClient#listJobs, the > same with Flink CLI. I think it’s okay to have non-SQL jobs listed > > in SQL > > client, because > these jobs can be managed via SQL client too. > > WRT finished time, I think you’re right. Adding it to the FLIP. But > > I’m a > > bit afraid that the > rows would be too long. > > WRT ‘DROP QUERY’, > > What's the behavior for batch jobs and the non-running jobs? > > > > In general, the behavior would be aligned with Flink CLI. Triggering > > a > > savepoint for > a non-running job would cause errors, and the error message would be > printed to > the SQL client. Triggering a savepoint for batch(unbounded) jobs in > streaming > execution mode would be the same with streaming jobs. However, for > > batch > > jobs in > batch execution mode, I think there would be an error, because batch > execution > doesn’t support checkpoints currently (please correct me if I’m > > wrong). > > > WRT ’SHOW SAVEPOINTS’, I’ve thought about it, but Flink > > clusterClient/ > > jobClient doesn’t have such a functionality at the moment, neither do > Flink CLI. > Maybe we could make it a follow-up FLIP, which includes the > > modifications > > to > clusterClient/jobClient and Flink CLI. WDYT? > > Best, > Paul Lam > > 2022年5月17日 20:34,godfrey he <godfre...@gmail.com> 写道: > > Godfrey > > > > > > > > > >