Hi Ajantha,

I have left some comments. It is an interesting direction, but there might
be some details that need to be fine tuned.

The doc is here [1] for others who might be interested. Resharing since I
do not think it was directly linked in the thread.

[1]
https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit

Thanks,
Walaa.

On Mon, Jul 15, 2024 at 11:09 PM Ajantha Bhat <ajanthab...@gmail.com> wrote:

> Hi, just another reminder since we didn't get any review on the proposal.
> Initially proposed on June 4.
>
> - Ajantha
>
> On Mon, Jun 24, 2024 at 4:21 PM Ajantha Bhat <ajanthab...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> We've only received one review so far (from Benny).
>>
>> We would appreciate more eyes on this.
>>
>> - Ajantha
>>
>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha Bhat <ajanthab...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>> Please find the proposal link
>>> https://github.com/apache/iceberg/issues/10432
>>>
>>> Google doc link is attached in the proposal.
>>> And Thanks Stephen Lin <https://github.com/sxlin> for working on it.
>>>
>>> Hope it gives more clarity to take the decisions and how we want to
>>> implement it.
>>>
>>> - Ajantha
>>>
>>> On Wed, May 29, 2024 at 4:01 AM Walaa Eldin Moustafa <
>>> wa.moust...@gmail.com> wrote:
>>>
>>>> Thanks Jack. I actually meant scalar/aggregate/table user defined
>>>> functions. Here are some examples of what I meant in (2):
>>>>
>>>> Hive GenericUDF:
>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java
>>>> Trino user defined functions:
>>>> https://trino.io/docs/current/develop/functions.html
>>>> Flink user defined functions:
>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/
>>>>
>>>> Probably what you referred to is a variation of (1) where the API is
>>>> data flow/data pipeline API instead of SQL (e.g., Spark Scala). Yes, that
>>>> is also possible in the very long run :)
>>>>
>>>> Thanks,
>>>> Walaa.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>
>>>>> > (2) Custom code written in imperative function according to a
>>>>> Java/Scala/Python API, etc.
>>>>>
>>>>> I think we could still explore some long term opportunities in this
>>>>> case. Consider you register a Spark temp view as some sort of data frame
>>>>> read, then it could still be resolved to a Spark plan that is 
>>>>> representable
>>>>> by an intermediate representation. But I agree this gets very complicated
>>>>> very soon, and just having the case (1) covered would already be a huge
>>>>> step forward.
>>>>>
>>>>> -Jack
>>>>>
>>>>>
>>>>> On Tue, May 28, 2024 at 1:40 PM Benny Chow <btc...@gmail.com> wrote:
>>>>>
>>>>>> It's interesting to note that a tabular SQL UDF can be used to build
>>>>>> a *parameterized *view.  So, there's definitely a lot in common
>>>>>> between UDFs and views.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa Eldin Moustafa <
>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>
>>>>>>> I think there is a disconnect about what is perceived as a "UDF".
>>>>>>> There are 2 flavors:
>>>>>>>
>>>>>>> (1) Functions that are defined by the user whose definition is a
>>>>>>> composition of other built-in functions/SQL expressions.
>>>>>>> (2) Custom code written in imperative function according to a
>>>>>>> Java/Scala/Python API, etc.
>>>>>>>
>>>>>>> All the examples in Ajantha's references are pretty much from (1)
>>>>>>> and I think those have more analogy to views due to their SQL nature. 
>>>>>>> Agree
>>>>>>> (2) is not practical to maintain by Iceberg, but I think Ajantha's use
>>>>>>> cases are around (1), and may be worth evaluating.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Walaa.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, May 28, 2024 at 9:45 AM Ajantha Bhat <ajanthab...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I guess we'll know more when you post the proposal, but I think
>>>>>>>>> this would be a very difficult area to tackle across engines, 
>>>>>>>>> languages,
>>>>>>>>> and memory models without having a huge performance penalty.
>>>>>>>>
>>>>>>>> Assuming Iceberg initially supports SQL representations of UDFs
>>>>>>>> (similar to views as shared by the reference links above), the 
>>>>>>>> complexity
>>>>>>>> involved will be similar to managing views.
>>>>>>>>
>>>>>>>> Thanks, Ryan, Robert, and Jack, for your input.
>>>>>>>> We will work on publishing the draft spec (inspired by the view
>>>>>>>> spec) this week to facilitate further discussions.
>>>>>>>>
>>>>>>>> - Ajantha
>>>>>>>>
>>>>>>>> On Tue, May 28, 2024 at 7:33 PM Jack Ye <yezhao...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> > While it would be great to have a common set of functions across
>>>>>>>>> engines, I don't see how that is practical when those engines are
>>>>>>>>> implemented so differently. Plugging in code -- and especially custom
>>>>>>>>> user-supplied code -- seems inherently specialized to me and should 
>>>>>>>>> be part
>>>>>>>>> of the engines' design.
>>>>>>>>>
>>>>>>>>> How is this different from the views? I feel we can say exactly
>>>>>>>>> the same thing for Iceberg views, but yet we have Iceberg 
>>>>>>>>> multi-dialect
>>>>>>>>> views implemented. Maybe it sounds like we are trying to draw a line
>>>>>>>>> between SQL vs other programming language as "code"? but I think SQL 
>>>>>>>>> is
>>>>>>>>> just another type of code, and we are already talking about compiling 
>>>>>>>>> all
>>>>>>>>> these different code dialects to an intermediate representation (using
>>>>>>>>> projects like Coral, Substrait), which will be stored as another type 
>>>>>>>>> of
>>>>>>>>> representation of Iceberg view. I think the same functionality can be 
>>>>>>>>> used
>>>>>>>>> for UDFs if developed.
>>>>>>>>>
>>>>>>>>> I actually hink adding UDF support is a good idea, even just a
>>>>>>>>> multi-dialect one like view, and that can allow engines to for example
>>>>>>>>> parse a view SQL, and when a function referenced cannot be resolved, 
>>>>>>>>> try to
>>>>>>>>> seek for a multi-dialect UDF definition.
>>>>>>>>>
>>>>>>>>> I guess we can discuss more when we have the actual proposal
>>>>>>>>> published.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Jack Ye
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM Robert Stupp <sn...@snazy.de>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> UDFs are as engine specific and portable and "non-centralized" as
>>>>>>>>>> views are. The same performance concerns apply to views as well.
>>>>>>>>>> Iceberg should define a common base upon which engines can build,
>>>>>>>>>> so the argument that UDFs aren't practical, because engines are 
>>>>>>>>>> different,
>>>>>>>>>> is probably only a temporary concern.
>>>>>>>>>>
>>>>>>>>>> In the long term, Iceberg should also try to tackle the idea to
>>>>>>>>>> make views portable, which is conceptually not that much different 
>>>>>>>>>> from
>>>>>>>>>> portable UDFs.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> PS: I'm not a fan of adding a negative touch to the idea of
>>>>>>>>>> having UDFs in Iceberg, especially not in this early stage.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 24.05.24 20:53, Ryan Blue wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks, Ajantha.
>>>>>>>>>>
>>>>>>>>>> I'm skeptical about whether it's a good idea to add UDFs tracked
>>>>>>>>>> by Iceberg catalogs. I think that Iceberg primarily deals with 
>>>>>>>>>> things that
>>>>>>>>>> are centralized, like tables of data. While it would be great to 
>>>>>>>>>> have a
>>>>>>>>>> common set of functions across engines, I don't see how that is 
>>>>>>>>>> practical
>>>>>>>>>> when those engines are implemented so differently. Plugging in code 
>>>>>>>>>> -- and
>>>>>>>>>> especially custom user-supplied code -- seems inherently specialized 
>>>>>>>>>> to me
>>>>>>>>>> and should be part of the engines' design.
>>>>>>>>>>
>>>>>>>>>> I guess we'll know more when you post the proposal, but I think
>>>>>>>>>> this would be a very difficult area to tackle across engines, 
>>>>>>>>>> languages,
>>>>>>>>>> and memory models without having a huge performance penalty.
>>>>>>>>>>
>>>>>>>>>> Ryan
>>>>>>>>>>
>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM Ajantha Bhat <
>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Everyone,
>>>>>>>>>>>
>>>>>>>>>>> This is a discussion to gauge the community interest in storing
>>>>>>>>>>> the Versioned SQL UDFs in Iceberg.
>>>>>>>>>>> We want to propose the spec addition for storing the versioned
>>>>>>>>>>> UDFs in Iceberg (inspired by view spec).
>>>>>>>>>>>
>>>>>>>>>>> These UDFs can operate similarly to views in that they are
>>>>>>>>>>> associated with tables, but they can accept arguments and produce 
>>>>>>>>>>> return
>>>>>>>>>>> values, or even function as inline expressions.
>>>>>>>>>>> Many Query engines like Dremio, Trino, Snowflake, Databricks
>>>>>>>>>>> Spark supports SQL UDFs at catalog level [1].
>>>>>>>>>>> But storing them in Iceberg can enable
>>>>>>>>>>> - Versioning of these UDFs.
>>>>>>>>>>> - Interoperability between the engines. Potentially engines can
>>>>>>>>>>> understand the UDFs written by other engines (with the translate 
>>>>>>>>>>> layer).
>>>>>>>>>>>
>>>>>>>>>>> We believe that integrating this feature into Iceberg would be a
>>>>>>>>>>> valuable addition, and we're eager to collaborate with the 
>>>>>>>>>>> community to
>>>>>>>>>>> develop a UDF specification.
>>>>>>>>>>> Stephen <stephen....@dremio.com> has already begun drafting a
>>>>>>>>>>> specification to propose to the community.
>>>>>>>>>>>
>>>>>>>>>>> Let us know your thoughts on this.
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> Dremio -
>>>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function
>>>>>>>>>>> Trino - https://trino.io/docs/current/sql/create-function.html
>>>>>>>>>>> Snowflake -
>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions
>>>>>>>>>>> Databricks -
>>>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html
>>>>>>>>>>>
>>>>>>>>>>> - Ajantha
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Ryan Blue
>>>>>>>>>> Tabular
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Robert Stupp
>>>>>>>>>> @snazy
>>>>>>>>>>
>>>>>>>>>>

Reply via email to