Hi all, Regarding `PIPELINE`, it comes from flink-core module, see `PipelineOptions` class for more details. `JOBS` is a more generic concept than `PIPELINES`. I'm also be fine with `JOBS`.
+1 to discuss JOBTREE in other FLIP. +1 to `STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] ` +1 to `CREATE SAVEPOINT FOR JOB <job_id>` and `DROP SAVEPOINT <savepoint_path>` Best, Godfrey Jing Ge <j...@ververica.com> 于2022年6月9日周四 01:48写道: > > Hi Paul, Hi Jark, > > Re JOBTREE, agree that it is out of the scope of this FLIP > > Re `RELEASE SAVEPOINT ALL', if the community prefers 'DROP' then 'DROP > SAVEPOINT ALL' housekeeping. WDYT? > > Best regards, > Jing > > > On Wed, Jun 8, 2022 at 2:54 PM Jark Wu <imj...@gmail.com> wrote: >> >> Hi Jing, >> >> Regarding JOBTREE (job lineage), I agree with Paul that this is out of the >> scope >> of this FLIP and can be discussed in another FLIP. >> >> Job lineage is a big topic that may involve many problems: >> 1) how to collect and report job entities, attributes, and lineages? >> 2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub? >> 3) how does Flink SQL CLI/Gateway know the lineage information and show >> jobtree? >> 4) ... >> >> Best, >> Jark >> >> On Wed, 8 Jun 2022 at 20:44, Jark Wu <imj...@gmail.com> wrote: >>> >>> Hi Paul, >>> >>> I'm fine with using JOBS. The only concern is that this may conflict with >>> displaying more detailed >>> information for query (e.g. query content, plan) in the future, e.g. SHOW >>> QUERIES EXTENDED in ksqldb[1]. >>> This is not a big problem as we can introduce SHOW QUERIES in the future if >>> necessary. >>> >>> > STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and >>> > `table.job.stop-with-drain`) >>> What about STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] ? >>> It might be trivial and error-prone to set configuration before executing a >>> statement, >>> and the configuration will affect all statements after that. >>> >>> > CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id> >>> We can simplify the statement to "CREATE SAVEPOINT FOR JOB <job_id>", >>> and always use configuration "state.savepoints.dir" as the default >>> savepoint dir. >>> The concern with using "<savepoint_path>" is here should be savepoint dir, >>> and savepoint_path is the returned value. >>> >>> I'm fine with other changes. >>> >>> Thanks, >>> Jark >>> >>> [1]: >>> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/ >>> >>> >>> >>> On Wed, 8 Jun 2022 at 15:07, Paul Lam <paullin3...@gmail.com> wrote: >>>> >>>> Hi Jing, >>>> >>>> Thank you for your inputs! >>>> >>>> TBH, I haven’t considered the ETL scenario that you mentioned. I think >>>> they’re managed just like other jobs interns of job lifecycles (please >>>> correct me if I’m wrong). >>>> >>>> WRT to the SQL statements about SQL lineages, I think it might be a little >>>> bit out of the scope of the FLIP, since it’s mainly about lifecycles. By >>>> the way, do we have these functionalities in Flink CLI or REST API already? >>>> >>>> WRT `RELEASE SAVEPOINT ALL`, I’m sorry for the deprecated FLIP docs, the >>>> community is more in favor of `DROP SAVEPOINT <savepoint_path>`. I’m >>>> updating the FLIP arcading to the latest discussions. >>>> >>>> Best, >>>> Paul Lam >>>> >>>> 2022年6月8日 07:31,Jing Ge <j...@ververica.com> 写道: >>>> >>>> Hi Paul, >>>> >>>> Sorry that I am a little bit too late to join this thread. Thanks for >>>> driving this and starting this informative discussion. The FLIP looks >>>> really interesting. It will help us a lot to manage Flink SQL jobs. >>>> >>>> Have you considered the ETL scenario with Flink SQL, where multiple SQLs >>>> build a DAG for many DAGs? >>>> >>>> 1) >>>> +1 for SHOW JOBS. I think sooner or later we will start to discuss how to >>>> support ETL jobs. Briefly speaking, SQLs that used to build the DAG are >>>> responsible to *produce* data as the result(cube, materialized view, etc.) >>>> for the future consumption by queries. The INSERT INTO SELECT FROM example >>>> in FLIP and CTAS are typical SQL in this case. I would prefer to call them >>>> Jobs instead of Queries. >>>> >>>> 2) >>>> Speaking of ETL DAG, we might want to see the lineage. Is it possible to >>>> support syntax like: >>>> >>>> SHOW JOBTREE <job_id> // shows the downstream DAG from the given job_id >>>> SHOW JOBTREE <job_id> FULL // shows the whole DAG that contains the given >>>> job_id >>>> SHOW JOBTREES // shows all DAGs >>>> SHOW ANCIENTS <job_id> // shows all parents of the given job_id >>>> >>>> 3) >>>> Could we also support Savepoint housekeeping syntax? We ran into this >>>> issue that a lot of savepoints have been created by customers (via their >>>> apps). It will take extra (hacking) effort to clean it. >>>> >>>> RELEASE SAVEPOINT ALL >>>> >>>> Best regards, >>>> Jing >>>> >>>> On Tue, Jun 7, 2022 at 2:35 PM Martijn Visser <martijnvis...@apache.org> >>>> wrote: >>>>> >>>>> Hi Paul, >>>>> >>>>> I'm still doubting the keyword for the SQL applications. SHOW QUERIES >>>>> could >>>>> imply that this will actually show the query, but we're returning IDs of >>>>> the running application. At first I was also not very much in favour of >>>>> SHOW JOBS since I prefer calling it 'Flink applications' and not 'Flink >>>>> jobs', but the glossary [1] made me reconsider. I would +1 SHOW/STOP JOBS >>>>> >>>>> Also +1 for the CREATE/SHOW/DROP SAVEPOINT syntax. >>>>> >>>>> Best regards, >>>>> >>>>> Martijn >>>>> >>>>> [1] >>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/glossary >>>>> >>>>> Op za 4 jun. 2022 om 10:38 schreef Paul Lam <paullin3...@gmail.com>: >>>>> >>>>> > Hi Godfrey, >>>>> > >>>>> > Sorry for the late reply, I was on vacation. >>>>> > >>>>> > It looks like we have a variety of preferences on the syntax, how about >>>>> > we >>>>> > choose the most acceptable one? >>>>> > >>>>> > WRT keyword for SQL jobs, we use JOBS, thus the statements related to >>>>> > jobs >>>>> > would be: >>>>> > >>>>> > - SHOW JOBS >>>>> > - STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and >>>>> > `table.job.stop-with-drain`) >>>>> > >>>>> > WRT savepoint for SQL jobs, we use the `CREATE/DROP` pattern with `FOR >>>>> > JOB`: >>>>> > >>>>> > - CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id> >>>>> > - SHOW SAVEPOINTS FOR JOB <job_id> (show savepoints the current job >>>>> > manager remembers) >>>>> > - DROP SAVEPOINT <savepoint_path> >>>>> > >>>>> > cc @Jark @ShengKai @Martijn @Timo . >>>>> > >>>>> > Best, >>>>> > Paul Lam >>>>> > >>>>> > >>>>> > godfrey he <godfre...@gmail.com> 于2022年5月23日周一 21:34写道: >>>>> > >>>>> >> Hi Paul, >>>>> >> >>>>> >> Thanks for the update. >>>>> >> >>>>> >> >'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs >>>>> >> (DataStream or SQL) or >>>>> >> clients (SQL client or CLI). >>>>> >> >>>>> >> Is DataStream job a QUERY? I think not. >>>>> >> For a QUERY, the most important concept is the statement. But the >>>>> >> result does not contain this info. >>>>> >> If we need to contain all jobs in the cluster, I think the name should >>>>> >> be JOB or PIPELINE. >>>>> >> I learn to SHOW PIPELINES and STOP PIPELINE [IF RUNNING] id. >>>>> >> >>>>> >> > SHOW SAVEPOINTS >>>>> >> To list the savepoint for a specific job, we need to specify a >>>>> >> specific pipeline, >>>>> >> the syntax should be SHOW SAVEPOINTS FOR PIPELINE id >>>>> >> >>>>> >> Best, >>>>> >> Godfrey >>>>> >> >>>>> >> Paul Lam <paullin3...@gmail.com> 于2022年5月20日周五 11:25写道: >>>>> >> > >>>>> >> > Hi Jark, >>>>> >> > >>>>> >> > WRT “DROP QUERY”, I agree that it’s not very intuitive, and that’s >>>>> >> > part of the reason why I proposed “STOP/CANCEL QUERY” at the >>>>> >> > beginning. The downside of it is that it’s not ANSI-SQL compatible. >>>>> >> > >>>>> >> > Another question is, what should be the syntax for ungracefully >>>>> >> > canceling a query? As ShengKai pointed out in a offline discussion, >>>>> >> > “STOP QUERY” and “CANCEL QUERY” might confuse SQL users. >>>>> >> > Flink CLI has both stop and cancel, mostly due to historical >>>>> >> > problems. >>>>> >> > >>>>> >> > WRT “SHOW SAVEPOINT”, I agree it’s a missing part. My concern is >>>>> >> > that savepoints are owned by users and beyond the lifecycle of a >>>>> >> > Flink >>>>> >> > cluster. For example, a user might take a savepoint at a custom path >>>>> >> > that’s different than the default savepoint path, I think jobmanager >>>>> >> would >>>>> >> > not remember that, not to mention the jobmanager may be a fresh new >>>>> >> > one after a cluster restart. Thus if we support “SHOW SAVEPOINT”, >>>>> >> > it's >>>>> >> > probably a best-effort one. >>>>> >> > >>>>> >> > WRT savepoint syntax, I’m thinking of the semantic of the savepoint. >>>>> >> > Savepoints are alias for nested transactions in DB area[1], and >>>>> >> > there’s >>>>> >> > correspondingly global transactions. If we consider Flink jobs as >>>>> >> > global transactions and Flink checkpoints as nested transactions, >>>>> >> > then the savepoint semantics are close, thus I think savepoint syntax >>>>> >> > in SQL-standard could be considered. But again, I’m don’t have very >>>>> >> > strong preference. >>>>> >> > >>>>> >> > Ping @Timo to get more inputs. >>>>> >> > >>>>> >> > [1] https://en.wikipedia.org/wiki/Nested_transaction < >>>>> >> https://en.wikipedia.org/wiki/Nested_transaction> >>>>> >> > >>>>> >> > Best, >>>>> >> > Paul Lam >>>>> >> > >>>>> >> > > 2022年5月18日 17:48,Jark Wu <imj...@gmail.com> 写道: >>>>> >> > > >>>>> >> > > Hi Paul, >>>>> >> > > >>>>> >> > > 1) SHOW QUERIES >>>>> >> > > +1 to add finished time, but it would be better to call it >>>>> >> > > "end_time" >>>>> >> to >>>>> >> > > keep aligned with names in Web UI. >>>>> >> > > >>>>> >> > > 2) DROP QUERY >>>>> >> > > I think we shouldn't throw exceptions for batch jobs, otherwise, >>>>> >> > > how >>>>> >> to >>>>> >> > > stop batch queries? >>>>> >> > > At present, I don't think "DROP" is a suitable keyword for this >>>>> >> statement. >>>>> >> > > From the perspective of users, "DROP" sounds like the query should >>>>> >> > > be >>>>> >> > > removed from the >>>>> >> > > list of "SHOW QUERIES". However, it doesn't. Maybe "STOP QUERY" is >>>>> >> more >>>>> >> > > suitable and >>>>> >> > > compliant with commands of Flink CLI. >>>>> >> > > >>>>> >> > > 3) SHOW SAVEPOINTS >>>>> >> > > I think this statement is needed, otherwise, savepoints are lost >>>>> >> after the >>>>> >> > > SAVEPOINT >>>>> >> > > command is executed. Savepoints can be retrieved from REST API >>>>> >> > > "/jobs/:jobid/checkpoints" >>>>> >> > > with filtering "checkpoint_type"="savepoint". It's also worth >>>>> >> considering >>>>> >> > > providing "SHOW CHECKPOINTS" >>>>> >> > > to list all checkpoints. >>>>> >> > > >>>>> >> > > 4) SAVEPOINT & RELEASE SAVEPOINT >>>>> >> > > I'm a little concerned with the SAVEPOINT and RELEASE SAVEPOINT >>>>> >> statements >>>>> >> > > now. >>>>> >> > > In the vendors, the parameters of SAVEPOINT and RELEASE SAVEPOINT >>>>> >> > > are >>>>> >> both >>>>> >> > > the same savepoint id. >>>>> >> > > However, in our syntax, the first one is query id, and the second >>>>> >> > > one >>>>> >> is >>>>> >> > > savepoint path, which is confusing and >>>>> >> > > not consistent. When I came across SHOW SAVEPOINT, I thought maybe >>>>> >> they >>>>> >> > > should be in the same syntax set. >>>>> >> > > For example, CREATE SAVEPOINT FOR [QUERY] <query_id> & DROP >>>>> >> > > SAVEPOINT >>>>> >> > > <sp_path>. >>>>> >> > > That means we don't follow the majority of vendors in SAVEPOINT >>>>> >> commands. I >>>>> >> > > would say the purpose is different in Flink. >>>>> >> > > What other's opinion on this? >>>>> >> > > >>>>> >> > > Best, >>>>> >> > > Jark >>>>> >> > > >>>>> >> > > [1]: >>>>> >> > > >>>>> >> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-checkpoints >>>>> >> > > >>>>> >> > > >>>>> >> > > On Wed, 18 May 2022 at 14:43, Paul Lam <paullin3...@gmail.com> >>>>> >> > > wrote: >>>>> >> > > >>>>> >> > >> Hi Godfrey, >>>>> >> > >> >>>>> >> > >> Thanks a lot for your inputs! >>>>> >> > >> >>>>> >> > >> 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs >>>>> >> (DataStream >>>>> >> > >> or SQL) or >>>>> >> > >> clients (SQL client or CLI). Under the hook, it’s based on >>>>> >> > >> ClusterClient#listJobs, the >>>>> >> > >> same with Flink CLI. I think it’s okay to have non-SQL jobs listed >>>>> >> in SQL >>>>> >> > >> client, because >>>>> >> > >> these jobs can be managed via SQL client too. >>>>> >> > >> >>>>> >> > >> WRT finished time, I think you’re right. Adding it to the FLIP. >>>>> >> > >> But >>>>> >> I’m a >>>>> >> > >> bit afraid that the >>>>> >> > >> rows would be too long. >>>>> >> > >> >>>>> >> > >> WRT ‘DROP QUERY’, >>>>> >> > >>> What's the behavior for batch jobs and the non-running jobs? >>>>> >> > >> >>>>> >> > >> >>>>> >> > >> In general, the behavior would be aligned with Flink CLI. >>>>> >> > >> Triggering >>>>> >> a >>>>> >> > >> savepoint for >>>>> >> > >> a non-running job would cause errors, and the error message would >>>>> >> > >> be >>>>> >> > >> printed to >>>>> >> > >> the SQL client. Triggering a savepoint for batch(unbounded) jobs >>>>> >> > >> in >>>>> >> > >> streaming >>>>> >> > >> execution mode would be the same with streaming jobs. However, for >>>>> >> batch >>>>> >> > >> jobs in >>>>> >> > >> batch execution mode, I think there would be an error, because >>>>> >> > >> batch >>>>> >> > >> execution >>>>> >> > >> doesn’t support checkpoints currently (please correct me if I’m >>>>> >> wrong). >>>>> >> > >> >>>>> >> > >> WRT ’SHOW SAVEPOINTS’, I’ve thought about it, but Flink >>>>> >> clusterClient/ >>>>> >> > >> jobClient doesn’t have such a functionality at the moment, >>>>> >> > >> neither do >>>>> >> > >> Flink CLI. >>>>> >> > >> Maybe we could make it a follow-up FLIP, which includes the >>>>> >> modifications >>>>> >> > >> to >>>>> >> > >> clusterClient/jobClient and Flink CLI. WDYT? >>>>> >> > >> >>>>> >> > >> Best, >>>>> >> > >> Paul Lam >>>>> >> > >> >>>>> >> > >>> 2022年5月17日 20:34,godfrey he <godfre...@gmail.com> 写道: >>>>> >> > >>> >>>>> >> > >>> Godfrey >>>>> >> > >> >>>>> >> > >> >>>>> >> > >>>>> >> >>>>> > >>>> >>>>