Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

godfrey he Mon, 23 May 2022 06:34:42 -0700

Hi Paul,

Thanks for the update.


>'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs (DataStream or 
>SQL) or
clients (SQL client or CLI).

Is DataStream job a QUERY? I think not.
For a QUERY, the most important concept is the statement. But the
result does not contain this info.
If we need to contain all jobs in the cluster, I think the name should
be JOB or PIPELINE.
I learn to SHOW PIPELINES and STOP PIPELINE [IF RUNNING] id.

> SHOW SAVEPOINTS
To list the savepoint for a specific job, we need to specify a
specific pipeline,
the syntax should be SHOW SAVEPOINTS FOR PIPELINE id

Best,
Godfrey

Paul Lam <[email protected]> 于2022年5月20日周五 11:25写道：
>
> Hi Jark,
>
> WRT “DROP QUERY”, I agree that it’s not very intuitive, and that’s
> part of the reason why I proposed “STOP/CANCEL QUERY” at the
> beginning. The downside of it is that it’s not ANSI-SQL compatible.
>
> Another question is, what should be the syntax for ungracefully
> canceling a query? As ShengKai pointed out in a offline discussion,
> “STOP QUERY” and “CANCEL QUERY” might confuse SQL users.
> Flink CLI has both stop and cancel, mostly due to historical problems.
>
> WRT “SHOW SAVEPOINT”, I agree it’s a missing part. My concern is
> that savepoints are owned by users and beyond the lifecycle of a Flink
> cluster. For example, a user might take a savepoint at a custom path
> that’s different than the default savepoint path, I think jobmanager would
> not remember that, not to mention the jobmanager may be a fresh new
> one after a cluster restart. Thus if we support “SHOW SAVEPOINT”, it's
> probably a best-effort one.
>
> WRT savepoint syntax, I’m thinking of the semantic of the savepoint.
> Savepoints are alias for nested transactions in DB area[1], and there’s
> correspondingly global transactions. If we consider Flink jobs as
> global transactions and Flink checkpoints as nested transactions,
> then the savepoint semantics are close, thus I think savepoint syntax
> in SQL-standard could be considered. But again, I’m don’t have very
> strong preference.
>
> Ping @Timo to get more inputs.
>
> [1] https://en.wikipedia.org/wiki/Nested_transaction 
> <https://en.wikipedia.org/wiki/Nested_transaction>
>
> Best,
> Paul Lam
>
> > 2022年5月18日 17:48，Jark Wu <[email protected]> 写道：
> >
> > Hi Paul,
> >
> > 1) SHOW QUERIES
> > +1 to add finished time, but it would be better to call it "end_time" to
> > keep aligned with names in Web UI.
> >
> > 2) DROP QUERY
> > I think we shouldn't throw exceptions for batch jobs, otherwise, how to
> > stop batch queries?
> > At present, I don't think "DROP" is a suitable keyword for this statement.
> > From the perspective of users, "DROP" sounds like the query should be
> > removed from the
> > list of "SHOW QUERIES". However, it doesn't. Maybe "STOP QUERY" is more
> > suitable and
> > compliant with commands of Flink CLI.
> >
> > 3) SHOW SAVEPOINTS
> > I think this statement is needed, otherwise, savepoints are lost after the
> > SAVEPOINT
> > command is executed. Savepoints can be retrieved from REST API
> > "/jobs/:jobid/checkpoints"
> > with filtering "checkpoint_type"="savepoint". It's also worth considering
> > providing "SHOW CHECKPOINTS"
> > to list all checkpoints.
> >
> > 4) SAVEPOINT & RELEASE SAVEPOINT
> > I'm a little concerned with the SAVEPOINT and RELEASE SAVEPOINT statements
> > now.
> > In the vendors, the parameters of SAVEPOINT and RELEASE SAVEPOINT are both
> > the same savepoint id.
> > However, in our syntax, the first one is query id, and the second one is
> > savepoint path, which is confusing and
> > not consistent. When I came across SHOW SAVEPOINT, I thought maybe they
> > should be in the same syntax set.
> > For example, CREATE SAVEPOINT FOR [QUERY] <query_id> & DROP SAVEPOINT
> > <sp_path>.
> > That means we don't follow the majority of vendors in SAVEPOINT commands. I
> > would say the purpose is different in Flink.
> > What other's opinion on this?
> >
> > Best,
> > Jark
> >
> > [1]:
> > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-checkpoints
> >
> >
> > On Wed, 18 May 2022 at 14:43, Paul Lam <[email protected]> wrote:
> >
> >> Hi Godfrey,
> >>
> >> Thanks a lot for your inputs!
> >>
> >> 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs (DataStream
> >> or SQL) or
> >> clients (SQL client or CLI). Under the hook, it’s based on
> >> ClusterClient#listJobs, the
> >> same with Flink CLI. I think it’s okay to have non-SQL jobs listed in SQL
> >> client, because
> >> these jobs can be managed via SQL client too.
> >>
> >> WRT finished time, I think you’re right. Adding it to the FLIP. But I’m a
> >> bit afraid that the
> >> rows would be too long.
> >>
> >> WRT ‘DROP QUERY’,
> >>> What's the behavior for batch jobs and the non-running jobs?
> >>
> >>
> >> In general, the behavior would be aligned with Flink CLI. Triggering a
> >> savepoint for
> >> a non-running job would cause errors, and the error message would be
> >> printed to
> >> the SQL client. Triggering a savepoint for batch(unbounded) jobs in
> >> streaming
> >> execution mode would be the same with streaming jobs. However, for batch
> >> jobs in
> >> batch execution mode, I think there would be an error, because batch
> >> execution
> >> doesn’t support checkpoints currently (please correct me if I’m wrong).
> >>
> >> WRT ’SHOW SAVEPOINTS’, I’ve thought about it, but Flink clusterClient/
> >> jobClient doesn’t have such a functionality at the moment, neither do
> >> Flink CLI.
> >> Maybe we could make it a follow-up FLIP, which includes the modifications
> >> to
> >> clusterClient/jobClient and Flink CLI. WDYT?
> >>
> >> Best,
> >> Paul Lam
> >>
> >>> 2022年5月17日 20:34，godfrey he <[email protected]> 写道：
> >>>
> >>> Godfrey
> >>
> >>
>

Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

Reply via email to