Re: [DISCUSS] Flink SQL Syntax for Query/Savepoint Management

Paul Lam Mon, 11 Apr 2022 20:28:12 -0700

Hi Jark,

Thanks a lot!


I’m thinking of the 2nd approach. With this approach, the query lifecycle 
statements 
(show/stop/savepoint etc) are basically equivalent alternatives to Flink CLI 
from the
user point of view.

BTW, the completed jobs might be missing in `SHOW QUERIES`, because for 
application/per-clusters modes, the clusters would stop when the job terminates.

WDYT?

Best,
Paul Lam

> 2022年4月11日 14:17，Jark Wu <[email protected]> 写道：
> 
> Hi Paul, I grant the permission to you.
> 
> Regarding the "SHOW QUERIES", how will you bookkeep and persist the running
> and complete queries?
> Or will you retrieve the queries information from the cluster every time
> when you receive the command?
> 
> 
> Best,
> Jark
> 
> 
> On Wed, 6 Apr 2022 at 11:23, Paul Lam <[email protected]> wrote:
> 
>> Hi Timo,
>> 
>> Thanks for you reply!
>> 
>>> It would be great to further investigate which other commands are
>> required that would be usually be exeuted via CLI commands. I would like to
>> avoid a large amount of FLIPs each adding a special job lifecycle command.
>> 
>> Okay. I listed only the commands about jobs/queries that’s required for
>> savepoints for simplicity. I would come up with a complete set of commands
>> for the full lifecycle of jobs.
>> 
>>> I guess job lifecycle commands don't make much sense in Table API? Or
>> are you planning to support those also TableEnvironment.executeSql and
>> integrate them into SQL parser?
>> 
>> Yes, I’m thinking of adding job lifecycle management in SQL Client. SQL
>> client could execute queries via TableEnvironment.executeSql and bookkeep
>> the IDs, which is similar to ResultSotre in LocalExecutor.
>> 
>> BTW, may I ask for the permission on Confluence to create a FLIP?
>> 
>> Best,
>> Paul Lam
>> 
>>> 2022年4月4日 15:36，Timo Walther <[email protected]> 写道：
>>> 
>>> Hi Paul,
>>> 
>>> thanks for proposing this. I think in general it makes sense to have
>> those commands in SQL Client.
>>> 
>>> However, this will be a big shift because we start adding job lifecycle
>> SQL syntax. It would be great to further investigate which other commands
>> are required that would be usually be exeuted via CLI commands. I would
>> like to avoid a large amount of FLIPs each adding a special job lifecycle
>> command
>>> 
>>> I guess job lifecycle commands don't make much sense in Table API? Or
>> are you planning to support those also TableEnvironment.executeSql and
>> integrate them into SQL parser?
>>> 
>>> Thanks,
>>> Timo
>>> 
>>> 
>>> Am 01.04.22 um 12:28 schrieb Paul Lam:
>>>> Hi Martjin,
>>>> 
>>>>> For any extension on the SQL syntax, there should be a FLIP. I would
>> like
>>>>> to understand how this works for both bounded and unbounded jobs, how
>> this
>>>>> works with the SQL upgrade story. Could you create one?
>>>> Sure. I’m preparing one. Please give me the permission if possible.
>>>> 
>>>> My Confluence user name is `paulin3280`, and the full name is `Paul
>> Lam`.
>>>> 
>>>>> I'm also copying in @Timo Walther <[email protected]> and @Jark Wu
>>>>> <[email protected]> for their opinion on this.
>>>> Looking forward to your opinions @Timo @Jark :)
>>>> 
>>>> Best,
>>>> Paul Lam
>>>> 
>>>>> 2022年4月1日 18:10，Martijn Visser <[email protected]> 写道：
>>>>> 
>>>>> Hi Paul,
>>>>> 
>>>>> For any extension on the SQL syntax, there should be a FLIP. I would
>> like
>>>>> to understand how this works for both bounded and unbounded jobs, how
>> this
>>>>> works with the SQL upgrade story. Could you create one?
>>>>> 
>>>>> I'm also copying in @Timo Walther <[email protected]> and @Jark Wu
>>>>> <[email protected]> for their opinion on this.
>>>>> 
>>>>> Best regards,
>>>>> 
>>>>> Martijn
>>>>> 
>>>>> On Fri, 1 Apr 2022 at 12:01, Paul Lam <[email protected]> wrote:
>>>>> 
>>>>>> Hi Martijn,
>>>>>> 
>>>>>> Thanks a lot for your input.
>>>>>> 
>>>>>>> Have you already thought on how you would implement this in Flink?
>>>>>> Yes, I roughly thought about the implementation:
>>>>>> 
>>>>>> 1. Extending Executor to support job list via ClusterClient.
>>>>>> 2. Extending Executor to support savepoint trigger/cancel/remove via
>>>>>> JobClient.
>>>>>> 3. Extending SQL parser to support the new statements via regex
>>>>>> (AbstractRegexParseStrategy) or Calcite.
>>>>>> 
>>>>>> IMHO, the implementation is not very complicated and barely touches
>> the
>>>>>> architecture of FLIP-91.
>>>>>> (BTW,  FLIP-91 might be a little bit outdated and doesn’t fully
>> reflect
>>>>>> the current status of Flink SQL client/gateway.)
>>>>>> 
>>>>>> WDYT?
>>>>>> 
>>>>>> Best,
>>>>>> Paul Lam
>>>>>> 
>>>>>>> 2022年4月1日 17:33，Martijn Visser <[email protected]> 写道：
>>>>>>> 
>>>>>>> Hi Paul,
>>>>>>> 
>>>>>>> Thanks for opening the discussion. I agree that there are
>> opportunities
>>>>>> in
>>>>>>> this area to increase user value.
>>>>>>> 
>>>>>>> I would say that the syntax should be part of a proposal in a FLIP,
>>>>>> because
>>>>>>> the implementation would actually be the complex part, not so much
>> the
>>>>>>> syntax :) Especially since this also touches on FLIP-91 [1]
>>>>>>> 
>>>>>>> Have you already thought on how you would implement this in Flink?
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> 
>>>>>>> Martijn Visser
>>>>>>> https://twitter.com/MartijnVisser82
>>>>>>> https://github.com/MartijnVisser
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>>> 
>>>>>>> On Fri, 1 Apr 2022 at 11:25, Paul Lam <[email protected]> wrote:
>>>>>>> 
>>>>>>>> Hi team,
>>>>>>>> 
>>>>>>>> Greetings from Apache Kyuubi(incubating) community. We’re
>> integrating
>>>>>>>> Flink as a SQL engine and aiming to make it production-ready.
>>>>>>>> 
>>>>>>>> However, query/savepoint management is a crucial but missing part in
>>>>>> Flink
>>>>>>>> SQL, thus we reach out to discuss the SQL syntax with Flink
>> community.
>>>>>>>> 
>>>>>>>> We propose to introduce the following statements:
>>>>>>>> 
>>>>>>>> SHOW QUERIES: shows the running queries in the current session,
>> which
>>>>>>>> mainly returns query(namely Flink job) IDs and SQL statements.
>>>>>>>> TRIGGER SAVEPOINT <query_id>: triggers a savepoint for the specified
>>>>>>>> query, which returns the stored path of the savepoint.
>>>>>>>> SHOW SAVEPOINTS <query_id>: shows the savepoints for the specified
>>>>>> query,
>>>>>>>> which returns the stored paths of the savepoints.
>>>>>>>> REMOVE SAVEPOINT <savepoint_path>: removes the specified savepoint.
>>>>>>>> 
>>>>>>>> WRT to keywords, `TRIGGER` and `SAVEPOINT` are already reserved
>> keywords
>>>>>>>> in Flink SQL[1], so the only new keyword is `QUERIES`.
>>>>>>>> 
>>>>>>>> If we reach a consensus on the syntax, we could either implement it
>> in
>>>>>>>> Kyuubi and contribute back to Flink, or directly implement it in
>> Flink.
>>>>>>>> 
>>>>>>>> Looking forward for your feedback ;)
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> 
>>>>>> 
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/overview/#reserved-keywords
>>>>>>>> Best,
>>>>>>>> Paul Lam
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
>>

Re: [DISCUSS] Flink SQL Syntax for Query/Savepoint Management

Reply via email to