Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

Martijn Visser Tue, 26 Apr 2022 05:14:25 -0700

Hi Paul,

Thanks for creating the FLIP and opening the discussion. I did get a bit
confused about the title, being "query lifecycle statements in SQL client".
This sounds like you want to adopt the SQL client, but you want to expand
the SQL syntax with lifecycle statements, which could be used from the SQL
client, but of course also in a Table API program that contains SQL. GIven
that you're highlighting the CLI client as unchanged, this adds to more
confusion.


I am interested if there's anything listed in the SQL 2016 standard on
these types of lifecycle statements. I did a quick scan for "SHOW QUERIES"
but couldn't find it. It would be great if we could stay as close as
possible to such syntax. Overall I'm not in favour of using QUERIES as a
keyword. I think Flink applications are not queries, but short- or long
running applications. Why should we follow Hive's setup and indeed not
others such as Snowflake, but also Postgres or MySQL?

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser


On Fri, 22 Apr 2022 at 12:06, Paul Lam <paullin3...@gmail.com> wrote:

> Hi Shengkai,
>
> Thanks a lot for your opinions!
>
> > 1. I think the keyword QUERY may confuse users because the statement also
> > works for the DML statement.
>
> I slightly lean to QUERY, because:
>
> Hive calls DMLs queries. We could be better aligned with Hive using QUERY,
> especially given that we plan to introduce Hive endpoint.
> QUERY is a more SQL-like concept and friendly to SQL users.
>
> In general, my preference: QUERY > JOB > TASK. I’m okay with JOB, but not
> very good with TASK, as it conflicts with the task concept in Flink runtime.
>
> We could wait for more feedbacks from the community.
>
> > 2. STOP/CANCEL is not very straightforward for the SQL users to terminate
> > their jobs.
>
> Agreed. I’m okay with DROP. And if we want to align with Hive, KILL might
> an alternative.
>
> > 3. I think CREATE/DROP SAVEPOINTS statement is more SQL-like.
>
> Agreed. It’s more SQL-like and intuitive. I’m updating the syntax on the
> FLIP.
>
> > 4. SHOW TASKS can just list the job id and use the DESCRIPE to get more
> > detailed job infos.
>
> That is a more SQL-like approach I think. But considering the
> ClusterClient APIs, we can fetch the names and the status along in one
> request,
> thus it may be more user friendly to return them all in the SHOW
> statement?
>
> > It's better we can also get the infos about the cluster where the job is
> > running on through the DESCRIBE statement.
>
> I think cluster info could be part of session properties instead. WDYT?
>
> Best,
> Paul Lam
>
> > 2022年4月22日 11:14，Shengkai Fang <fskm...@gmail.com> 写道：
> >
> > Hi Paul
> >
> > Sorry for the late response. I propose my thoughts here.
> >
> > 1. I think the keyword QUERY may confuse users because the statement also
> > works for the DML statement. I find the Snowflakes[1] supports
> >
> > - CREATE TASK
> > - DROP TASK
> > - ALTER TASK
> > - SHOW TASKS
> > - DESCRIPE TASK
> >
> > I think we can follow snowflake to use `TASK` as the keyword or use the
> > keyword `JOB`?
> >
> > 2. STOP/CANCEL is not very straightforward for the SQL users to terminate
> > their jobs.
> >
> > ```
> > DROP TASK [IF EXISTS] <job id> PURGE; -- Forcely stop the job with drain
> >
> > DROP TASK [IF EXISTS] <job id>; -- Stop the task with savepoints
> > ```
> >
> > Oracle[2] uses the PURGE to clean up the table and users can't not
> recover.
> > I think it also works for us to terminate the job permanently.
> >
> > 3. I think CREATE/DROP SAVEPOINTS statement is more SQL-like. Users can
> use
> > the
> >
> > ```
> >  SET 'state.savepoints.dir' = '<path_to_savepoint>';
> >  SET 'state.savepoints.fomat' = 'native';
> >  CREATE SAVEPOINT <job id>;
> >
> >  DROP SAVEPOINT <path_to_savepoint>;
> > ```
> >
> > 4. SHOW TASKS can just list the job id and use the DESCRIPE to get more
> > detailed job infos.
> >
> > ```
> >
> > SHOW TASKS;
> >
> >
> > +----------------------------------+
> > |            job_id                |
> > +----------------------------------+
> > | 0f6413c33757fbe0277897dd94485f04 |
> > +----------------------------------+
> >
> > DESCRIPE TASK <job id>;
> >
> > +------------------------
> > |  job name   | status  |
> > +------------------------
> > | insert-sink | running |
> > +------------------------
> >
> > ```
> > It's better we can also get the infos about the cluster where the job is
> > running on through the DESCRIBE statement.
> >
> >
> > [1]
> >
> https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management
> <
> https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management
> >
> > [2]
> >
> https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806
> <
> https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806
> >
> >
> > Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com>>
> 于2022年4月21日周四 10:36写道：
> >
> >> ping @Timo @Jark @Shengkai
> >>
> >> Best,
> >> Paul Lam
> >>
> >>> 2022年4月18日 17:12，Paul Lam <paullin3...@gmail.com> 写道：
> >>>
> >>> Hi team,
> >>>
> >>> I’d like to start a discussion about FLIP-222 [1], which adds query
> >> lifecycle
> >>> statements to SQL client.
> >>>
> >>> Currently, SQL client supports submitting queries (queries in a broad
> >> sense,
> >>> including DQLs and DMLs) but no further lifecycle statements, like
> >> canceling
> >>> a query or triggering a savepoint. That makes SQL users have to rely on
> >>> CLI or REST API to manage theirs queries.
> >>>
> >>> Thus, I propose to introduce the following statements to fill the gap.
> >>> SHOW QUERIES
> >>> STOP QUERY <query_id>
> >>> CANCEL QUERY <query_id>
> >>> TRIGGER SAVEPOINT <savepoint_path>
> >>> DISPOSE SAVEPOINT <savepoint_path>
> >>> These statement would align SQL client with CLI, providing the full
> >> lifecycle
> >>> management for queries/jobs.
> >>>
> >>> Please see the FLIP page[1] for more details. Thanks a lot!
> >>> (For reference, the previous discussion thread see [2].)
> >>>
> >>> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-222%3A+Support+full+query+lifecycle+statements+in+SQL+client
> >> <
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client
> >
> >>>
> >>> [2] https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb <
> https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb> <
> >> https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb <
> https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb>>
> >>>
> >>> Best,
> >>> Paul Lam
>
>

Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

Reply via email to