Hi Jing, Regarding JOBTREE (job lineage), I agree with Paul that this is out of the scope of this FLIP and can be discussed in another FLIP.
Job lineage is a big topic that may involve many problems: 1) how to collect and report job entities, attributes, and lineages? 2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub? 3) how does Flink SQL CLI/Gateway know the lineage information and show jobtree? 4) ... Best, Jark On Wed, 8 Jun 2022 at 20:44, Jark Wu <imj...@gmail.com> wrote: > Hi Paul, > > I'm fine with using JOBS. The only concern is that this may conflict with > displaying more detailed > information for query (e.g. query content, plan) in the future, e.g. SHOW > QUERIES EXTENDED in ksqldb[1]. > This is not a big problem as we can introduce SHOW QUERIES in the future > if necessary. > > > STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and > `table.job.stop-with-drain`) > What about STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] ? > It might be trivial and error-prone to set configuration before executing > a statement, > and the configuration will affect all statements after that. > > > CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id> > We can simplify the statement to "CREATE SAVEPOINT FOR JOB <job_id>", > and always use configuration "state.savepoints.dir" as the default > savepoint dir. > The concern with using "<savepoint_path>" is here should be savepoint dir, > and savepoint_path is the returned value. > > I'm fine with other changes. > > Thanks, > Jark > > [1]: > https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/ > > > > > On Wed, 8 Jun 2022 at 15:07, Paul Lam <paullin3...@gmail.com> wrote: > >> Hi Jing, >> >> Thank you for your inputs! >> >> TBH, I haven’t considered the ETL scenario that you mentioned. I think >> they’re managed just like other jobs interns of job lifecycles (please >> correct me if I’m wrong). >> >> WRT to the SQL statements about SQL lineages, I think it might be a >> little bit out of the scope of the FLIP, since it’s mainly about >> lifecycles. By the way, do we have these functionalities in Flink CLI or >> REST API already? >> >> WRT `RELEASE SAVEPOINT ALL`, I’m sorry for the deprecated FLIP docs, the >> community is more in favor of `DROP SAVEPOINT <savepoint_path>`. I’m >> updating the FLIP arcading to the latest discussions. >> >> Best, >> Paul Lam >> >> 2022年6月8日 07:31,Jing Ge <j...@ververica.com> 写道: >> >> Hi Paul, >> >> Sorry that I am a little bit too late to join this thread. Thanks for >> driving this and starting this informative discussion. The FLIP looks >> really interesting. It will help us a lot to manage Flink SQL jobs. >> >> Have you considered the ETL scenario with Flink SQL, where multiple SQLs >> build a DAG for many DAGs? >> >> 1) >> +1 for SHOW JOBS. I think sooner or later we will start to discuss how to >> support ETL jobs. Briefly speaking, SQLs that used to build the DAG are >> responsible to *produce* data as the result(cube, materialized view, etc.) >> for the future consumption by queries. The INSERT INTO SELECT FROM example >> in FLIP and CTAS are typical SQL in this case. I would prefer to call them >> Jobs instead of Queries. >> >> 2) >> Speaking of ETL DAG, we might want to see the lineage. Is it possible to >> support syntax like: >> >> SHOW JOBTREE <job_id> // shows the downstream DAG from the given job_id >> SHOW JOBTREE <job_id> FULL // shows the whole DAG that contains the given >> job_id >> SHOW JOBTREES // shows all DAGs >> SHOW ANCIENTS <job_id> // shows all parents of the given job_id >> >> 3) >> Could we also support Savepoint housekeeping syntax? We ran into this >> issue that a lot of savepoints have been created by customers (via their >> apps). It will take extra (hacking) effort to clean it. >> >> RELEASE SAVEPOINT ALL >> >> Best regards, >> Jing >> >> On Tue, Jun 7, 2022 at 2:35 PM Martijn Visser <martijnvis...@apache.org> >> wrote: >> >>> Hi Paul, >>> >>> I'm still doubting the keyword for the SQL applications. SHOW QUERIES >>> could >>> imply that this will actually show the query, but we're returning IDs of >>> the running application. At first I was also not very much in favour of >>> SHOW JOBS since I prefer calling it 'Flink applications' and not 'Flink >>> jobs', but the glossary [1] made me reconsider. I would +1 SHOW/STOP JOBS >>> >>> Also +1 for the CREATE/SHOW/DROP SAVEPOINT syntax. >>> >>> Best regards, >>> >>> Martijn >>> >>> [1] >>> >>> https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/glossary >>> >>> Op za 4 jun. 2022 om 10:38 schreef Paul Lam <paullin3...@gmail.com>: >>> >>> > Hi Godfrey, >>> > >>> > Sorry for the late reply, I was on vacation. >>> > >>> > It looks like we have a variety of preferences on the syntax, how >>> about we >>> > choose the most acceptable one? >>> > >>> > WRT keyword for SQL jobs, we use JOBS, thus the statements related to >>> jobs >>> > would be: >>> > >>> > - SHOW JOBS >>> > - STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and >>> > `table.job.stop-with-drain`) >>> > >>> > WRT savepoint for SQL jobs, we use the `CREATE/DROP` pattern with `FOR >>> > JOB`: >>> > >>> > - CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id> >>> > - SHOW SAVEPOINTS FOR JOB <job_id> (show savepoints the current job >>> > manager remembers) >>> > - DROP SAVEPOINT <savepoint_path> >>> > >>> > cc @Jark @ShengKai @Martijn @Timo . >>> > >>> > Best, >>> > Paul Lam >>> > >>> > >>> > godfrey he <godfre...@gmail.com> 于2022年5月23日周一 21:34写道: >>> > >>> >> Hi Paul, >>> >> >>> >> Thanks for the update. >>> >> >>> >> >'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs >>> >> (DataStream or SQL) or >>> >> clients (SQL client or CLI). >>> >> >>> >> Is DataStream job a QUERY? I think not. >>> >> For a QUERY, the most important concept is the statement. But the >>> >> result does not contain this info. >>> >> If we need to contain all jobs in the cluster, I think the name should >>> >> be JOB or PIPELINE. >>> >> I learn to SHOW PIPELINES and STOP PIPELINE [IF RUNNING] id. >>> >> >>> >> > SHOW SAVEPOINTS >>> >> To list the savepoint for a specific job, we need to specify a >>> >> specific pipeline, >>> >> the syntax should be SHOW SAVEPOINTS FOR PIPELINE id >>> >> >>> >> Best, >>> >> Godfrey >>> >> >>> >> Paul Lam <paullin3...@gmail.com> 于2022年5月20日周五 11:25写道: >>> >> > >>> >> > Hi Jark, >>> >> > >>> >> > WRT “DROP QUERY”, I agree that it’s not very intuitive, and that’s >>> >> > part of the reason why I proposed “STOP/CANCEL QUERY” at the >>> >> > beginning. The downside of it is that it’s not ANSI-SQL compatible. >>> >> > >>> >> > Another question is, what should be the syntax for ungracefully >>> >> > canceling a query? As ShengKai pointed out in a offline discussion, >>> >> > “STOP QUERY” and “CANCEL QUERY” might confuse SQL users. >>> >> > Flink CLI has both stop and cancel, mostly due to historical >>> problems. >>> >> > >>> >> > WRT “SHOW SAVEPOINT”, I agree it’s a missing part. My concern is >>> >> > that savepoints are owned by users and beyond the lifecycle of a >>> Flink >>> >> > cluster. For example, a user might take a savepoint at a custom path >>> >> > that’s different than the default savepoint path, I think jobmanager >>> >> would >>> >> > not remember that, not to mention the jobmanager may be a fresh new >>> >> > one after a cluster restart. Thus if we support “SHOW SAVEPOINT”, >>> it's >>> >> > probably a best-effort one. >>> >> > >>> >> > WRT savepoint syntax, I’m thinking of the semantic of the savepoint. >>> >> > Savepoints are alias for nested transactions in DB area[1], and >>> there’s >>> >> > correspondingly global transactions. If we consider Flink jobs as >>> >> > global transactions and Flink checkpoints as nested transactions, >>> >> > then the savepoint semantics are close, thus I think savepoint >>> syntax >>> >> > in SQL-standard could be considered. But again, I’m don’t have very >>> >> > strong preference. >>> >> > >>> >> > Ping @Timo to get more inputs. >>> >> > >>> >> > [1] https://en.wikipedia.org/wiki/Nested_transaction < >>> >> https://en.wikipedia.org/wiki/Nested_transaction> >>> >> > >>> >> > Best, >>> >> > Paul Lam >>> >> > >>> >> > > 2022年5月18日 17:48,Jark Wu <imj...@gmail.com> 写道: >>> >> > > >>> >> > > Hi Paul, >>> >> > > >>> >> > > 1) SHOW QUERIES >>> >> > > +1 to add finished time, but it would be better to call it >>> "end_time" >>> >> to >>> >> > > keep aligned with names in Web UI. >>> >> > > >>> >> > > 2) DROP QUERY >>> >> > > I think we shouldn't throw exceptions for batch jobs, otherwise, >>> how >>> >> to >>> >> > > stop batch queries? >>> >> > > At present, I don't think "DROP" is a suitable keyword for this >>> >> statement. >>> >> > > From the perspective of users, "DROP" sounds like the query >>> should be >>> >> > > removed from the >>> >> > > list of "SHOW QUERIES". However, it doesn't. Maybe "STOP QUERY" is >>> >> more >>> >> > > suitable and >>> >> > > compliant with commands of Flink CLI. >>> >> > > >>> >> > > 3) SHOW SAVEPOINTS >>> >> > > I think this statement is needed, otherwise, savepoints are lost >>> >> after the >>> >> > > SAVEPOINT >>> >> > > command is executed. Savepoints can be retrieved from REST API >>> >> > > "/jobs/:jobid/checkpoints" >>> >> > > with filtering "checkpoint_type"="savepoint". It's also worth >>> >> considering >>> >> > > providing "SHOW CHECKPOINTS" >>> >> > > to list all checkpoints. >>> >> > > >>> >> > > 4) SAVEPOINT & RELEASE SAVEPOINT >>> >> > > I'm a little concerned with the SAVEPOINT and RELEASE SAVEPOINT >>> >> statements >>> >> > > now. >>> >> > > In the vendors, the parameters of SAVEPOINT and RELEASE SAVEPOINT >>> are >>> >> both >>> >> > > the same savepoint id. >>> >> > > However, in our syntax, the first one is query id, and the second >>> one >>> >> is >>> >> > > savepoint path, which is confusing and >>> >> > > not consistent. When I came across SHOW SAVEPOINT, I thought maybe >>> >> they >>> >> > > should be in the same syntax set. >>> >> > > For example, CREATE SAVEPOINT FOR [QUERY] <query_id> & DROP >>> SAVEPOINT >>> >> > > <sp_path>. >>> >> > > That means we don't follow the majority of vendors in SAVEPOINT >>> >> commands. I >>> >> > > would say the purpose is different in Flink. >>> >> > > What other's opinion on this? >>> >> > > >>> >> > > Best, >>> >> > > Jark >>> >> > > >>> >> > > [1]: >>> >> > > >>> >> >>> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-checkpoints >>> >> > > >>> >> > > >>> >> > > On Wed, 18 May 2022 at 14:43, Paul Lam <paullin3...@gmail.com> >>> wrote: >>> >> > > >>> >> > >> Hi Godfrey, >>> >> > >> >>> >> > >> Thanks a lot for your inputs! >>> >> > >> >>> >> > >> 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs >>> >> (DataStream >>> >> > >> or SQL) or >>> >> > >> clients (SQL client or CLI). Under the hook, it’s based on >>> >> > >> ClusterClient#listJobs, the >>> >> > >> same with Flink CLI. I think it’s okay to have non-SQL jobs >>> listed >>> >> in SQL >>> >> > >> client, because >>> >> > >> these jobs can be managed via SQL client too. >>> >> > >> >>> >> > >> WRT finished time, I think you’re right. Adding it to the FLIP. >>> But >>> >> I’m a >>> >> > >> bit afraid that the >>> >> > >> rows would be too long. >>> >> > >> >>> >> > >> WRT ‘DROP QUERY’, >>> >> > >>> What's the behavior for batch jobs and the non-running jobs? >>> >> > >> >>> >> > >> >>> >> > >> In general, the behavior would be aligned with Flink CLI. >>> Triggering >>> >> a >>> >> > >> savepoint for >>> >> > >> a non-running job would cause errors, and the error message >>> would be >>> >> > >> printed to >>> >> > >> the SQL client. Triggering a savepoint for batch(unbounded) jobs >>> in >>> >> > >> streaming >>> >> > >> execution mode would be the same with streaming jobs. However, >>> for >>> >> batch >>> >> > >> jobs in >>> >> > >> batch execution mode, I think there would be an error, because >>> batch >>> >> > >> execution >>> >> > >> doesn’t support checkpoints currently (please correct me if I’m >>> >> wrong). >>> >> > >> >>> >> > >> WRT ’SHOW SAVEPOINTS’, I’ve thought about it, but Flink >>> >> clusterClient/ >>> >> > >> jobClient doesn’t have such a functionality at the moment, >>> neither do >>> >> > >> Flink CLI. >>> >> > >> Maybe we could make it a follow-up FLIP, which includes the >>> >> modifications >>> >> > >> to >>> >> > >> clusterClient/jobClient and Flink CLI. WDYT? >>> >> > >> >>> >> > >> Best, >>> >> > >> Paul Lam >>> >> > >> >>> >> > >>> 2022年5月17日 20:34,godfrey he <godfre...@gmail.com> 写道: >>> >> > >>> >>> >> > >>> Godfrey >>> >> > >> >>> >> > >> >>> >> > >>> >> >>> > >>> >> >>