Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

Jark Wu Thu, 08 Jun 2023 01:31:16 -0700

Thank you for the proposal, yuxia! The FLIP looks good to me. 

Best,
Jark


> 2023年6月8日 11:39，yuxia <luoyu...@alumni.sjtu.edu.cn> 写道：
> 
> Hi, all.
> Thanks everyone for the valuable input. If there are are no further concerns 
> about this FLIP[1], I would like to start voting next monday (6/12).
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-311%3A+Support+Call+Stored+Procedure
> 
> 
> Best regards,
> Yuxia
> 
> ----- 原始邮件 -----
> 发件人: "Martijn Visser" <martijnvis...@apache.org>
> 收件人: "dev" <dev@flink.apache.org>
> 发送时间: 星期二, 2023年 6 月 06日 下午 3:57:56
> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> 
> Hi Yuxia,
> 
> Thanks for the clarification. I would be +0 overall, because I think
> without actually allowing creation/customization of stored procedures, the
> value for the majority of Flink users will be minimal.
> 
> Best regards,
> 
> Martijn
> 
> On Tue, Jun 6, 2023 at 3:52 AM yuxia <luoyu...@alumni.sjtu.edu.cn> wrote:
> 
>> Hi, Martijn.
>> Thanks for you feedback.
>> 1: In this FLIP we don't intend to allow users to customize their own
>> stored procedure for we don't want to expose too much to users too early as
>> the FLIP said.
>> The procedures are supposed to be provided only by Catalog. Catalog devs
>> can write their build-in procedures, and return the procedure in method
>> Catalog.getProcedure(ObjectPath procedurePath);
>> So, there won't be SQL syntax to create/save a stored procedure in this
>> FLIP. If we find we do need it, we can propse the SQL syntax to create a
>> stored procedure in another dedicated FLIP.
>> 
>> 2: The syntax `Call procedure_name(xx)` proposed in this FLIP is the
>> default syntax in Calcite for call stored procedures. Actaully, we don't
>> need to do any modifcation in flink-sql-parser module for syntax of calling
>> a procedure. MySQL[1], Postgres[2], Oracle[3] also use the syntax to call a
>> stored procedure.
>> 
>> 
>> [1] https://dev.mysql.com/doc/refman/8.0/en/call.html
>> [2] https://www.postgresql.org/docs/15/sql-call.html
>> [3] https://docs.oracle.com/javadb/10.8.3.0/ref/rrefcallprocedure.html
>> 
>> Best regards,
>> Yuxia
>> 
>> ----- 原始邮件 -----
>> 发件人: "Martijn Visser" <martijnvis...@apache.org>
>> 收件人: "dev" <dev@flink.apache.org>
>> 发送时间: 星期一, 2023年 6 月 05日 下午 8:35:44
>> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
>> 
>> Hi Yuxia,
>> 
>> Thanks for the FLIP. I have a couple of questions:
>> 
>> 1. The syntax talks about how to CALL or SHOW the available stored
>> procedures, but not on how to create one. Will there not be a SQL syntax to
>> create/save a stored procedure?
>> 2. Is there a default syntax in Calcite for stored procedures? What do
>> other databases do, do they use CALL/SHOW or something like EXEC, USE?
>> 
>> Best regards,
>> 
>> Martijn
>> 
>> On Mon, Jun 5, 2023 at 3:23 AM yuxia <luoyu...@alumni.sjtu.edu.cn> wrote:
>> 
>>> Hi, Jane.
>>> Thanks for you input. I think we can add the auxiliary command show
>>> procedures in this FLIP.
>>> Following the syntax for show functions proposed in FLIP-297.
>>> The syntax will be
>>> SHOW PROCEDURES [ ( FROM | IN ) [catalog_name.]database_name ] [ [NOT]
>>> (LIKE | ILIKE) <sql_like_pattern> ].
>>> I have updated to this FLIP.
>>> 
>>> The other auxiliary commands maybe not suitable currently or need a
>>> further/dedicated dicussion. Let's keep this FLIP focus.
>>> 
>>> [1]
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements
>>> 
>>> Best regards,
>>> Yuxia
>>> 
>>> ----- 原始邮件 -----
>>> 发件人: "Jane Chan" <qingyue....@gmail.com>
>>> 收件人: "dev" <dev@flink.apache.org>
>>> 发送时间: 星期六, 2023年 6 月 03日 下午 7:04:39
>>> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
>>> 
>>> Hi Yuxia,
>>> 
>>> Thanks for bringing this to the discussion. The call procedure is a
>> widely
>>> used feature and will be very useful for users.
>>> 
>>> I just have one question regarding the usage. The FLIP mentioned that
>>> 
>>> Flink will allow connector developers to develop their own built-in
>> stored
>>>> procedures, and then enables users to call these predefiend stored
>>>> procedures.
>>>> 
>>> In this FLIP, we don't intend to allow users to customize their own
>> stored
>>>> procedure  for we don't want to expose too much to users too early.
>>> 
>>> 
>>> If I understand correctly, we might need to provide some auxiliary
>> commands
>>> to inform users what built-in procedures are provided and how to use
>> them.
>>> For example, Snowflake provides commands like [1] [2], and MySQL provides
>>> commands like [3] [4].
>>> 
>>> [1] SHOW PROCEDURES,
>>> https://docs.snowflake.com/en/sql-reference/sql/show-procedures
>>> [2] DESCRIBE PROCEDURE <procedure_name>,
>>> https://docs.snowflake.com/en/sql-reference/sql/desc-procedure
>>> [3] SHOW PROCEDURE CODE，
>>> https://dev.mysql.com/doc/refman/5.7/en/show-procedure-code.html
>>> [4] SHOW PROCEDURE STATUS,
>>> https://dev.mysql.com/doc/refman/5.7/en/show-procedure-status.html
>>> 
>>> Best,
>>> Jane
>>> 
>>> On Sat, Jun 3, 2023 at 3:20 PM Benchao Li <libenc...@apache.org> wrote:
>>> 
>>>> Thanks Yuxia for the explanation, it makes sense to me. It would be
>> great
>>>> if you also add this to the FLIP doc.
>>>> 
>>>> yuxia <luoyu...@alumni.sjtu.edu.cn> 于2023年6月1日周四 17:11写道：
>>>> 
>>>>> Hi, Benchao.
>>>>> Thanks for your attention.
>>>>> 
>>>>> Initially, I also want to pass `TableEnvironment` to procedure. But
>>>>> according my investegation and offline discussion with Jingson, the
>>> real
>>>>> important thing for procedure devs is the ability to build Flink
>>>>> datastream. But we can't get the `StreamExecutionEnvironment` which
>> is
>>>> the
>>>>> entrypoint to build datastream. That's to say we will lost the
>> ability
>>> to
>>>>> build a datastream if we just pass `TableEnvironment`.
>>>>> 
>>>>> Of course, we can also pass `TableEnvironment` along with
>>>>> `StreamExecutionEnvironment` to Procedure. But I'm intend to be
>>> cautious
>>>>> about exposing too much too early to procedure devs. If someday we
>> find
>>>> we
>>>>> will need `TableEnvironment` to custom a procedure, we can then add a
>>>>> method like `getTableEnvironment()` in `ProcedureContext`.
>>>>> 
>>>>> Best regards,
>>>>> Yuxia
>>>>> 
>>>>> ----- 原始邮件 -----
>>>>> 发件人: "Benchao Li" <libenc...@apache.org>
>>>>> 收件人: "dev" <dev@flink.apache.org>
>>>>> 发送时间: 星期四, 2023年 6 月 01日 下午 12:58:08
>>>>> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
>>>>> 
>>>>> Thanks Yuxia for opening this discussion,
>>>>> 
>>>>> The general idea looks good to me, I only have one question about the
>>>>> `ProcedureContext#getExecutionEnvironment`. Why are you proposing to
>>>> return
>>>>> a `StreamExecutionEnvironment` instead of `TableEnvironment`, could
>> you
>>>>> elaborate a little more on this?
>>>>> 
>>>>> Jingsong Li <jingsongl...@gmail.com> 于2023年5月30日周二 17:58写道：
>>>>> 
>>>>>> Thanks for your explanation.
>>>>>> 
>>>>>> We can support Iterable in future. Current design looks good to me.
>>>>>> 
>>>>>> Best,
>>>>>> Jingsong
>>>>>> 
>>>>>> On Tue, May 30, 2023 at 4:56 PM yuxia <luoyu...@alumni.sjtu.edu.cn
>>> 
>>>>> wrote:
>>>>>>> 
>>>>>>> Hi, Jingsong.
>>>>>>> Thanks for your feedback.
>>>>>>> 
>>>>>>>> Does this need to be a function call? Do you have some example?
>>>>>>> I think it'll be useful to support function call when user call
>>>>>> procedure.
>>>>>>> The following example is from iceberg:[1]
>>>>>>> CALL catalog_name.system.migrate('spark_catalog.db.sample',
>>>> map('foo',
>>>>>> 'bar'));
>>>>>>> 
>>>>>>> It allows user to use `map('foo', 'bar')` to pass a map data to
>>>>>> procedure.
>>>>>>> 
>>>>>>> Another case that I can imagine may be rollback a table to the
>>>> snapshot
>>>>>> of one week ago.
>>>>>>> Then, with function call, user may call `rollback(table_name,
>>> now() -
>>>>>> INTERVAL '7' DAY)` to acheive such purpose.
>>>>>>> 
>>>>>>> Although it can be function call, the eventual parameter got by
>> the
>>>>>> procedure will always be the literal evaluated.
>>>>>>> 
>>>>>>> 
>>>>>>>> Procedure looks like a TableFunction, do you consider using
>>>> Collector
>>>>>>> something like TableFunction? (Supports large amount of data)
>>>>>>> 
>>>>>>> Yes, I had considered it. But returns T[] is for simpility,
>>>>>>> 
>>>>>>> First, regarding how to return the calling result of a procedure,
>>> it
>>>>>> looks more intuitive to me to use the return result of the `call`
>>>> method
>>>>>> instead of by calling something like collector#collect.
>>>>>>> Introduce a collector will increase necessary complexity.
>>>>>>> 
>>>>>>> Second, regarding supporting large amount of data,  acoording my
>>>>>> investagtion, I haven't seen the requirement that supports
>> returning
>>>>> large
>>>>>> amount of data.
>>>>>>> Iceberg also return an array.[2] If you do think we should
>> support
>>>>> large
>>>>>> amount of data, I think we can change to return type from T[] to
>>>>> Iterable<T>
>>>>>>> 
>>>>>>> [1]:
>>>> https://iceberg.apache.org/docs/latest/spark-procedures/#migrate
>>>>>>> [2]:
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://github.com/apache/iceberg/blob/601c5af9b6abded79dabeba177331310d5487f43/spark/v3.2/spark/src/main/java/org/apache/spark/sql/connector/iceberg/catalog/Procedure.java#L44
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> Yuxia
>>>>>>> 
>>>>>>> ----- 原始邮件 -----
>>>>>>> 发件人: "Jingsong Li" <jingsongl...@gmail.com>
>>>>>>> 收件人: "dev" <dev@flink.apache.org>
>>>>>>> 发送时间: 星期一, 2023年 5 月 29日 下午 2:42:04
>>>>>>> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
>>>>>>> 
>>>>>>> Thanks Yuxia for the proposal.
>>>>>>> 
>>>>>>>> CALL [catalog_name.][database_name.]procedure_name ([
>> expression
>>> [,
>>>>>> expression]* ] )
>>>>>>> 
>>>>>>> The expression can be a function call. Does this need to be a
>>>> function
>>>>>>> call? Do you have some example?
>>>>>>> 
>>>>>>>> Procedure returns T[]
>>>>>>> 
>>>>>>> Procedure looks like a TableFunction, do you consider using
>>> Collector
>>>>>>> something like TableFunction? (Supports large amount of data)
>>>>>>> 
>>>>>>> Best,
>>>>>>> Jingsong
>>>>>>> 
>>>>>>> On Mon, May 29, 2023 at 2:33 PM yuxia <
>> luoyu...@alumni.sjtu.edu.cn
>>>> 
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi, everyone.
>>>>>>>> 
>>>>>>>> I’d like to start a discussion about FLIP-311: Support Call
>>> Stored
>>>>>> Procedure [1]
>>>>>>>> 
>>>>>>>> Stored procedure provides a convenient way to encapsulate
>> complex
>>>>>> logic to perform data manipulation or administrative tasks in
>>> external
>>>>>> storage systems. It's widely used in traditional databases and
>>> popular
>>>>>> compute engines like Trino for it's convenience. Therefore, we
>>> propose
>>>>>> adding support for call stored procedure in Flink to enable better
>>>>>> integration with external storage systems.
>>>>>>>> 
>>>>>>>> With this FLIP, Flink will allow connector developers to
>> develop
>>>>> their
>>>>>> own built-in stored procedures, and then enables users to call
>> these
>>>>>> predefiend stored procedures.
>>>>>>>> 
>>>>>>>> Looking forward to your feedbacks.
>>>>>>>> 
>>>>>>>> [1]:
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-311%3A+Support+Call+Stored+Procedure
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> Yuxia
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Best,
>>>>> Benchao Li
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> Best,
>>>> Benchao Li
>>>> 
>>> 
>>

Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

Reply via email to