Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

Paul Lam Thu, 09 Jun 2022 01:07:49 -0700

Hi Martijn,

I think the `DROP SAVEPOINT` statement would not conflict with NO_CLAIM mode, 
since the statement is triggered by users instead of Flink runtime.


We’re simply providing a tool for user to cleanup the savepoints, just like 
`bin/flink savepoint -d :savepointPath` in Flink CLI [1].

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/savepoints/#disposing-savepoints

Best,
Paul Lam

> 2022年6月9日 15:41，Martijn Visser <martijnvis...@apache.org> 写道：
> 
> Hi all,
> 
> I would not include a DROP SAVEPOINT syntax. With the recently introduced 
> CLAIM/NO CLAIM mode, I would argue that we've just clarified snapshot 
> ownership and if you have a savepoint established "with NO_CLAIM it creates 
> its own copy and leaves the existing one up to the user." [1] We shouldn't 
> then again make it fuzzy by making it possible that Flink can remove 
> snapshots. 
> 
> Best regards,
> 
> Martijn
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
>  
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership>
> Op do 9 jun. 2022 om 09:27 schreef Paul Lam <paullin3...@gmail.com 
> <mailto:paullin3...@gmail.com>>:
> Hi team,
> 
> It's great to see our opinions are finally converging!
> 
>> `STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] `
> 
> 
> LGTM. Adding it to the FLIP.
> 
> To Jark,
> 
>> We can simplify the statement to "CREATE SAVEPOINT FOR JOB <job_id>”
> 
> Good point. The default savepoint dir should be enough for most cases.
> 
> To Jing,
> 
>> DROP SAVEPOINT ALL
> 
> I think it’s valid to have such a statement, but I have two concerns:
> `ALL` is already an SQL keyword, thus it may cause ambiguity.
> Flink CLI and REST API doesn’t provided the corresponding functionalities, 
> and we’d better keep them aligned.
> How about making this statement as follow-up tasks which should touch REST 
> API and Flink CLI?
> 
> Best,
> Paul Lam
> 
>> 2022年6月9日 11:53，godfrey he <godfre...@gmail.com 
>> <mailto:godfre...@gmail.com>> 写道：
>> 
>> Hi all,
>> 
>> Regarding `PIPELINE`, it comes from flink-core module, see
>> `PipelineOptions` class for more details.
>> `JOBS` is a more generic concept than `PIPELINES`. I'm also be fine with 
>> `JOBS`.
>> 
>> +1 to discuss JOBTREE in other FLIP.
>> 
>> +1 to `STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] `
>> 
>> +1 to `CREATE SAVEPOINT FOR JOB <job_id>` and `DROP SAVEPOINT 
>> <savepoint_path>`
>> 
>> Best,
>> Godfrey
>> 
>> Jing Ge <j...@ververica.com <mailto:j...@ververica.com>> 于2022年6月9日周四 
>> 01:48写道：
>>> 
>>> Hi Paul, Hi Jark,
>>> 
>>> Re JOBTREE, agree that it is out of the scope of this FLIP
>>> 
>>> Re `RELEASE SAVEPOINT ALL', if the community prefers 'DROP' then 'DROP 
>>> SAVEPOINT ALL' housekeeping. WDYT?
>>> 
>>> Best regards,
>>> Jing
>>> 
>>> 
>>> On Wed, Jun 8, 2022 at 2:54 PM Jark Wu <imj...@gmail.com 
>>> <mailto:imj...@gmail.com>> wrote:
>>>> 
>>>> Hi Jing,
>>>> 
>>>> Regarding JOBTREE (job lineage), I agree with Paul that this is out of the 
>>>> scope
>>>> of this FLIP and can be discussed in another FLIP.
>>>> 
>>>> Job lineage is a big topic that may involve many problems:
>>>> 1) how to collect and report job entities, attributes, and lineages?
>>>> 2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub?
>>>> 3) how does Flink SQL CLI/Gateway know the lineage information and show 
>>>> jobtree?
>>>> 4) ...
>>>> 
>>>> Best,
>>>> Jark
>>>> 
>>>> On Wed, 8 Jun 2022 at 20:44, Jark Wu <imj...@gmail.com 
>>>> <mailto:imj...@gmail.com>> wrote:
>>>>> 
>>>>> Hi Paul,
>>>>> 
>>>>> I'm fine with using JOBS. The only concern is that this may conflict with 
>>>>> displaying more detailed
>>>>> information for query (e.g. query content, plan) in the future, e.g. SHOW 
>>>>> QUERIES EXTENDED in ksqldb[1].
>>>>> This is not a big problem as we can introduce SHOW QUERIES in the future 
>>>>> if necessary.
>>>>> 
>>>>>> STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and 
>>>>>> `table.job.stop-with-drain`)
>>>>> What about STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] ?
>>>>> It might be trivial and error-prone to set configuration before executing 
>>>>> a statement,
>>>>> and the configuration will affect all statements after that.
>>>>> 
>>>>>> CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id>
>>>>> We can simplify the statement to "CREATE SAVEPOINT FOR JOB <job_id>",
>>>>> and always use configuration "state.savepoints.dir" as the default 
>>>>> savepoint dir.
>>>>> The concern with using "<savepoint_path>" is here should be savepoint dir,
>>>>> and savepoint_path is the returned value.
>>>>> 
>>>>> I'm fine with other changes.
>>>>> 
>>>>> Thanks,
>>>>> Jark
>>>>> 
>>>>> [1]: 
>>>>> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/
>>>>>  
>>>>> <https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/>
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, 8 Jun 2022 at 15:07, Paul Lam <paullin3...@gmail.com 
>>>>> <mailto:paullin3...@gmail.com>> wrote:
>>>>>> 
>>>>>> Hi Jing,
>>>>>> 
>>>>>> Thank you for your inputs!
>>>>>> 
>>>>>> TBH, I haven’t considered the ETL scenario that you mentioned. I think 
>>>>>> they’re managed just like other jobs interns of job lifecycles (please 
>>>>>> correct me if I’m wrong).
>>>>>> 
>>>>>> WRT to the SQL statements about SQL lineages, I think it might be a 
>>>>>> little bit out of the scope of the FLIP, since it’s mainly about 
>>>>>> lifecycles. By the way, do we have these functionalities in Flink CLI or 
>>>>>> REST API already?
>>>>>> 
>>>>>> WRT `RELEASE SAVEPOINT ALL`, I’m sorry for the deprecated FLIP docs, the 
>>>>>> community is more in favor of `DROP SAVEPOINT <savepoint_path>`. I’m 
>>>>>> updating the FLIP arcading to the latest discussions.
>>>>>> 
>>>>>> Best,
>>>>>> Paul Lam
>>>>>> 
>>>>>> 2022年6月8日 07:31，Jing Ge <j...@ververica.com <mailto:j...@ververica.com>> 
>>>>>> 写道：
>>>>>> 
>>>>>> Hi Paul,
>>>>>> 
>>>>>> Sorry that I am a little bit too late to join this thread. Thanks for 
>>>>>> driving this and starting this informative discussion. The FLIP looks 
>>>>>> really interesting. It will help us a lot to manage Flink SQL jobs.
>>>>>> 
>>>>>> Have you considered the ETL scenario with Flink SQL, where multiple SQLs 
>>>>>> build a DAG for many DAGs?
>>>>>> 
>>>>>> 1)
>>>>>> +1 for SHOW JOBS. I think sooner or later we will start to discuss how 
>>>>>> to support ETL jobs. Briefly speaking, SQLs that used to build the DAG 
>>>>>> are responsible to *produce* data as the result(cube, materialized view, 
>>>>>> etc.) for the future consumption by queries. The INSERT INTO SELECT FROM 
>>>>>> example in FLIP and CTAS are typical SQL in this case. I would prefer to 
>>>>>> call them Jobs instead of Queries.
>>>>>> 
>>>>>> 2)
>>>>>> Speaking of ETL DAG, we might want to see the lineage. Is it possible to 
>>>>>> support syntax like:
>>>>>> 
>>>>>> SHOW JOBTREE <job_id>  // shows the downstream DAG from the given job_id
>>>>>> SHOW JOBTREE <job_id> FULL // shows the whole DAG that contains the 
>>>>>> given job_id
>>>>>> SHOW JOBTREES // shows all DAGs
>>>>>> SHOW ANCIENTS <job_id> // shows all parents of the given job_id
>>>>>> 
>>>>>> 3)
>>>>>> Could we also support Savepoint housekeeping syntax? We ran into this 
>>>>>> issue that a lot of savepoints have been created by customers (via their 
>>>>>> apps). It will take extra (hacking) effort to clean it.
>>>>>> 
>>>>>> RELEASE SAVEPOINT ALL
>>>>>> 
>>>>>> Best regards,
>>>>>> Jing
>>>>>> 
>>>>>> On Tue, Jun 7, 2022 at 2:35 PM Martijn Visser <martijnvis...@apache.org 
>>>>>> <mailto:martijnvis...@apache.org>> wrote:
>>>>>>> 
>>>>>>> Hi Paul,
>>>>>>> 
>>>>>>> I'm still doubting the keyword for the SQL applications. SHOW QUERIES 
>>>>>>> could
>>>>>>> imply that this will actually show the query, but we're returning IDs of
>>>>>>> the running application. At first I was also not very much in favour of
>>>>>>> SHOW JOBS since I prefer calling it 'Flink applications' and not 'Flink
>>>>>>> jobs', but the glossary [1] made me reconsider. I would +1 SHOW/STOP 
>>>>>>> JOBS
>>>>>>> 
>>>>>>> Also +1 for the CREATE/SHOW/DROP SAVEPOINT syntax.
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> 
>>>>>>> Martijn
>>>>>>> 
>>>>>>> [1]
>>>>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/glossary
>>>>>>>  
>>>>>>> <https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/glossary>
>>>>>>> 
>>>>>>> Op za 4 jun. 2022 om 10:38 schreef Paul Lam <paullin3...@gmail.com 
>>>>>>> <mailto:paullin3...@gmail.com>>:
>>>>>>> 
>>>>>>>> Hi Godfrey,
>>>>>>>> 
>>>>>>>> Sorry for the late reply, I was on vacation.
>>>>>>>> 
>>>>>>>> It looks like we have a variety of preferences on the syntax, how 
>>>>>>>> about we
>>>>>>>> choose the most acceptable one?
>>>>>>>> 
>>>>>>>> WRT keyword for SQL jobs, we use JOBS, thus the statements related to 
>>>>>>>> jobs
>>>>>>>> would be:
>>>>>>>> 
>>>>>>>> - SHOW JOBS
>>>>>>>> - STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and
>>>>>>>> `table.job.stop-with-drain`)
>>>>>>>> 
>>>>>>>> WRT savepoint for SQL jobs, we use the `CREATE/DROP` pattern with `FOR
>>>>>>>> JOB`:
>>>>>>>> 
>>>>>>>> - CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id>
>>>>>>>> - SHOW SAVEPOINTS FOR JOB <job_id> (show savepoints the current job
>>>>>>>> manager remembers)
>>>>>>>> - DROP SAVEPOINT <savepoint_path>
>>>>>>>> 
>>>>>>>> cc @Jark @ShengKai @Martijn @Timo .
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Paul Lam
>>>>>>>> 
>>>>>>>> 
>>>>>>>> godfrey he <godfre...@gmail.com <mailto:godfre...@gmail.com>> 
>>>>>>>> 于2022年5月23日周一 21:34写道：
>>>>>>>> 
>>>>>>>>> Hi Paul,
>>>>>>>>> 
>>>>>>>>> Thanks for the update.
>>>>>>>>> 
>>>>>>>>>> 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs
>>>>>>>>> (DataStream or SQL) or
>>>>>>>>> clients (SQL client or CLI).
>>>>>>>>> 
>>>>>>>>> Is DataStream job a QUERY? I think not.
>>>>>>>>> For a QUERY, the most important concept is the statement. But the
>>>>>>>>> result does not contain this info.
>>>>>>>>> If we need to contain all jobs in the cluster, I think the name should
>>>>>>>>> be JOB or PIPELINE.
>>>>>>>>> I learn to SHOW PIPELINES and STOP PIPELINE [IF RUNNING] id.
>>>>>>>>> 
>>>>>>>>>> SHOW SAVEPOINTS
>>>>>>>>> To list the savepoint for a specific job, we need to specify a
>>>>>>>>> specific pipeline,
>>>>>>>>> the syntax should be SHOW SAVEPOINTS FOR PIPELINE id
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Godfrey
>>>>>>>>> 
>>>>>>>>> Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com>> 
>>>>>>>>> 于2022年5月20日周五 11:25写道：
>>>>>>>>>> 
>>>>>>>>>> Hi Jark,
>>>>>>>>>> 
>>>>>>>>>> WRT “DROP QUERY”, I agree that it’s not very intuitive, and that’s
>>>>>>>>>> part of the reason why I proposed “STOP/CANCEL QUERY” at the
>>>>>>>>>> beginning. The downside of it is that it’s not ANSI-SQL compatible.
>>>>>>>>>> 
>>>>>>>>>> Another question is, what should be the syntax for ungracefully
>>>>>>>>>> canceling a query? As ShengKai pointed out in a offline discussion,
>>>>>>>>>> “STOP QUERY” and “CANCEL QUERY” might confuse SQL users.
>>>>>>>>>> Flink CLI has both stop and cancel, mostly due to historical 
>>>>>>>>>> problems.
>>>>>>>>>> 
>>>>>>>>>> WRT “SHOW SAVEPOINT”, I agree it’s a missing part. My concern is
>>>>>>>>>> that savepoints are owned by users and beyond the lifecycle of a 
>>>>>>>>>> Flink
>>>>>>>>>> cluster. For example, a user might take a savepoint at a custom path
>>>>>>>>>> that’s different than the default savepoint path, I think jobmanager
>>>>>>>>> would
>>>>>>>>>> not remember that, not to mention the jobmanager may be a fresh new
>>>>>>>>>> one after a cluster restart. Thus if we support “SHOW SAVEPOINT”, 
>>>>>>>>>> it's
>>>>>>>>>> probably a best-effort one.
>>>>>>>>>> 
>>>>>>>>>> WRT savepoint syntax, I’m thinking of the semantic of the savepoint.
>>>>>>>>>> Savepoints are alias for nested transactions in DB area[1], and 
>>>>>>>>>> there’s
>>>>>>>>>> correspondingly global transactions. If we consider Flink jobs as
>>>>>>>>>> global transactions and Flink checkpoints as nested transactions,
>>>>>>>>>> then the savepoint semantics are close, thus I think savepoint syntax
>>>>>>>>>> in SQL-standard could be considered. But again, I’m don’t have very
>>>>>>>>>> strong preference.
>>>>>>>>>> 
>>>>>>>>>> Ping @Timo to get more inputs.
>>>>>>>>>> 
>>>>>>>>>> [1] https://en.wikipedia.org/wiki/Nested_transaction 
>>>>>>>>>> <https://en.wikipedia.org/wiki/Nested_transaction> <
>>>>>>>>> https://en.wikipedia.org/wiki/Nested_transaction 
>>>>>>>>> <https://en.wikipedia.org/wiki/Nested_transaction>>
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Paul Lam
>>>>>>>>>> 
>>>>>>>>>>> 2022年5月18日 17:48，Jark Wu <imj...@gmail.com 
>>>>>>>>>>> <mailto:imj...@gmail.com>> 写道：
>>>>>>>>>>> 
>>>>>>>>>>> Hi Paul,
>>>>>>>>>>> 
>>>>>>>>>>> 1) SHOW QUERIES
>>>>>>>>>>> +1 to add finished time, but it would be better to call it 
>>>>>>>>>>> "end_time"
>>>>>>>>> to
>>>>>>>>>>> keep aligned with names in Web UI.
>>>>>>>>>>> 
>>>>>>>>>>> 2) DROP QUERY
>>>>>>>>>>> I think we shouldn't throw exceptions for batch jobs, otherwise, how
>>>>>>>>> to
>>>>>>>>>>> stop batch queries?
>>>>>>>>>>> At present, I don't think "DROP" is a suitable keyword for this
>>>>>>>>> statement.
>>>>>>>>>>> From the perspective of users, "DROP" sounds like the query should 
>>>>>>>>>>> be
>>>>>>>>>>> removed from the
>>>>>>>>>>> list of "SHOW QUERIES". However, it doesn't. Maybe "STOP QUERY" is
>>>>>>>>> more
>>>>>>>>>>> suitable and
>>>>>>>>>>> compliant with commands of Flink CLI.
>>>>>>>>>>> 
>>>>>>>>>>> 3) SHOW SAVEPOINTS
>>>>>>>>>>> I think this statement is needed, otherwise, savepoints are lost
>>>>>>>>> after the
>>>>>>>>>>> SAVEPOINT
>>>>>>>>>>> command is executed. Savepoints can be retrieved from REST API
>>>>>>>>>>> "/jobs/:jobid/checkpoints"
>>>>>>>>>>> with filtering "checkpoint_type"="savepoint". It's also worth
>>>>>>>>> considering
>>>>>>>>>>> providing "SHOW CHECKPOINTS"
>>>>>>>>>>> to list all checkpoints.
>>>>>>>>>>> 
>>>>>>>>>>> 4) SAVEPOINT & RELEASE SAVEPOINT
>>>>>>>>>>> I'm a little concerned with the SAVEPOINT and RELEASE SAVEPOINT
>>>>>>>>> statements
>>>>>>>>>>> now.
>>>>>>>>>>> In the vendors, the parameters of SAVEPOINT and RELEASE SAVEPOINT 
>>>>>>>>>>> are
>>>>>>>>> both
>>>>>>>>>>> the same savepoint id.
>>>>>>>>>>> However, in our syntax, the first one is query id, and the second 
>>>>>>>>>>> one
>>>>>>>>> is
>>>>>>>>>>> savepoint path, which is confusing and
>>>>>>>>>>> not consistent. When I came across SHOW SAVEPOINT, I thought maybe
>>>>>>>>> they
>>>>>>>>>>> should be in the same syntax set.
>>>>>>>>>>> For example, CREATE SAVEPOINT FOR [QUERY] <query_id> & DROP 
>>>>>>>>>>> SAVEPOINT
>>>>>>>>>>> <sp_path>.
>>>>>>>>>>> That means we don't follow the majority of vendors in SAVEPOINT
>>>>>>>>> commands. I
>>>>>>>>>>> would say the purpose is different in Flink.
>>>>>>>>>>> What other's opinion on this?
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Jark
>>>>>>>>>>> 
>>>>>>>>>>> [1]:
>>>>>>>>>>> 
>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-checkpoints
>>>>>>>>>  
>>>>>>>>> <https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-checkpoints>
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, 18 May 2022 at 14:43, Paul Lam <paullin3...@gmail.com 
>>>>>>>>>>> <mailto:paullin3...@gmail.com>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Godfrey,
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks a lot for your inputs!
>>>>>>>>>>>> 
>>>>>>>>>>>> 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs
>>>>>>>>> (DataStream
>>>>>>>>>>>> or SQL) or
>>>>>>>>>>>> clients (SQL client or CLI). Under the hook, it’s based on
>>>>>>>>>>>> ClusterClient#listJobs, the
>>>>>>>>>>>> same with Flink CLI. I think it’s okay to have non-SQL jobs listed
>>>>>>>>> in SQL
>>>>>>>>>>>> client, because
>>>>>>>>>>>> these jobs can be managed via SQL client too.
>>>>>>>>>>>> 
>>>>>>>>>>>> WRT finished time, I think you’re right. Adding it to the FLIP. But
>>>>>>>>> I’m a
>>>>>>>>>>>> bit afraid that the
>>>>>>>>>>>> rows would be too long.
>>>>>>>>>>>> 
>>>>>>>>>>>> WRT ‘DROP QUERY’,
>>>>>>>>>>>>> What's the behavior for batch jobs and the non-running jobs?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> In general, the behavior would be aligned with Flink CLI. 
>>>>>>>>>>>> Triggering
>>>>>>>>> a
>>>>>>>>>>>> savepoint for
>>>>>>>>>>>> a non-running job would cause errors, and the error message would 
>>>>>>>>>>>> be
>>>>>>>>>>>> printed to
>>>>>>>>>>>> the SQL client. Triggering a savepoint for batch(unbounded) jobs in
>>>>>>>>>>>> streaming
>>>>>>>>>>>> execution mode would be the same with streaming jobs. However, for
>>>>>>>>> batch
>>>>>>>>>>>> jobs in
>>>>>>>>>>>> batch execution mode, I think there would be an error, because 
>>>>>>>>>>>> batch
>>>>>>>>>>>> execution
>>>>>>>>>>>> doesn’t support checkpoints currently (please correct me if I’m
>>>>>>>>> wrong).
>>>>>>>>>>>> 
>>>>>>>>>>>> WRT ’SHOW SAVEPOINTS’, I’ve thought about it, but Flink
>>>>>>>>> clusterClient/
>>>>>>>>>>>> jobClient doesn’t have such a functionality at the moment, neither 
>>>>>>>>>>>> do
>>>>>>>>>>>> Flink CLI.
>>>>>>>>>>>> Maybe we could make it a follow-up FLIP, which includes the
>>>>>>>>> modifications
>>>>>>>>>>>> to
>>>>>>>>>>>> clusterClient/jobClient and Flink CLI. WDYT?
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Paul Lam
>>>>>>>>>>>> 
>>>>>>>>>>>>> 2022年5月17日 20:34，godfrey he <godfre...@gmail.com 
>>>>>>>>>>>>> <mailto:godfre...@gmail.com>> 写道：
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Godfrey
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>

Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

Reply via email to