> (2) Custom code written in imperative function according to a
Java/Scala/Python API, etc.

I think we could still explore some long term opportunities in this case.
Consider you register a Spark temp view as some sort of data frame read,
then it could still be resolved to a Spark plan that is representable by an
intermediate representation. But I agree this gets very complicated very
soon, and just having the case (1) covered would already be a huge step
forward.

-Jack


On Tue, May 28, 2024 at 1:40 PM Benny Chow <btc...@gmail.com> wrote:

> It's interesting to note that a tabular SQL UDF can be used to build a 
> *parameterized
> *view.  So, there's definitely a lot in common between UDFs and views.
>
> Thanks
>
> On Tue, May 28, 2024 at 9:53 AM Walaa Eldin Moustafa <
> wa.moust...@gmail.com> wrote:
>
>> I think there is a disconnect about what is perceived as a "UDF". There
>> are 2 flavors:
>>
>> (1) Functions that are defined by the user whose definition is a
>> composition of other built-in functions/SQL expressions.
>> (2) Custom code written in imperative function according to a
>> Java/Scala/Python API, etc.
>>
>> All the examples in Ajantha's references are pretty much from (1) and I
>> think those have more analogy to views due to their SQL nature. Agree (2)
>> is not practical to maintain by Iceberg, but I think Ajantha's use cases
>> are around (1), and may be worth evaluating.
>>
>> Thanks,
>> Walaa.
>>
>>
>> On Tue, May 28, 2024 at 9:45 AM Ajantha Bhat <ajanthab...@gmail.com>
>> wrote:
>>
>>> I guess we'll know more when you post the proposal, but I think this
>>>> would be a very difficult area to tackle across engines, languages, and
>>>> memory models without having a huge performance penalty.
>>>
>>> Assuming Iceberg initially supports SQL representations of UDFs (similar
>>> to views as shared by the reference links above), the complexity involved
>>> will be similar to managing views.
>>>
>>> Thanks, Ryan, Robert, and Jack, for your input.
>>> We will work on publishing the draft spec (inspired by the view spec)
>>> this week to facilitate further discussions.
>>>
>>> - Ajantha
>>>
>>> On Tue, May 28, 2024 at 7:33 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>
>>>> > While it would be great to have a common set of functions across
>>>> engines, I don't see how that is practical when those engines are
>>>> implemented so differently. Plugging in code -- and especially custom
>>>> user-supplied code -- seems inherently specialized to me and should be part
>>>> of the engines' design.
>>>>
>>>> How is this different from the views? I feel we can say exactly the
>>>> same thing for Iceberg views, but yet we have Iceberg multi-dialect views
>>>> implemented. Maybe it sounds like we are trying to draw a line between SQL
>>>> vs other programming language as "code"? but I think SQL is just another
>>>> type of code, and we are already talking about compiling all these
>>>> different code dialects to an intermediate representation (using projects
>>>> like Coral, Substrait), which will be stored as another type of
>>>> representation of Iceberg view. I think the same functionality can be used
>>>> for UDFs if developed.
>>>>
>>>> I actually hink adding UDF support is a good idea, even just a
>>>> multi-dialect one like view, and that can allow engines to for example
>>>> parse a view SQL, and when a function referenced cannot be resolved, try to
>>>> seek for a multi-dialect UDF definition.
>>>>
>>>> I guess we can discuss more when we have the actual proposal published.
>>>>
>>>> Best,
>>>> Jack Ye
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, May 28, 2024 at 1:32 AM Robert Stupp <sn...@snazy.de> wrote:
>>>>
>>>>> UDFs are as engine specific and portable and "non-centralized" as
>>>>> views are. The same performance concerns apply to views as well.
>>>>> Iceberg should define a common base upon which engines can build, so
>>>>> the argument that UDFs aren't practical, because engines are different, is
>>>>> probably only a temporary concern.
>>>>>
>>>>> In the long term, Iceberg should also try to tackle the idea to make
>>>>> views portable, which is conceptually not that much different from 
>>>>> portable
>>>>> UDFs.
>>>>>
>>>>>
>>>>> PS: I'm not a fan of adding a negative touch to the idea of having
>>>>> UDFs in Iceberg, especially not in this early stage.
>>>>>
>>>>>
>>>>> On 24.05.24 20:53, Ryan Blue wrote:
>>>>>
>>>>> Thanks, Ajantha.
>>>>>
>>>>> I'm skeptical about whether it's a good idea to add UDFs tracked by
>>>>> Iceberg catalogs. I think that Iceberg primarily deals with things that 
>>>>> are
>>>>> centralized, like tables of data. While it would be great to have a common
>>>>> set of functions across engines, I don't see how that is practical when
>>>>> those engines are implemented so differently. Plugging in code -- and
>>>>> especially custom user-supplied code -- seems inherently specialized to me
>>>>> and should be part of the engines' design.
>>>>>
>>>>> I guess we'll know more when you post the proposal, but I think this
>>>>> would be a very difficult area to tackle across engines, languages, and
>>>>> memory models without having a huge performance penalty.
>>>>>
>>>>> Ryan
>>>>>
>>>>> On Fri, May 24, 2024 at 8:10 AM Ajantha Bhat <ajanthab...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Everyone,
>>>>>>
>>>>>> This is a discussion to gauge the community interest in storing the
>>>>>> Versioned SQL UDFs in Iceberg.
>>>>>> We want to propose the spec addition for storing the versioned UDFs
>>>>>> in Iceberg (inspired by view spec).
>>>>>>
>>>>>> These UDFs can operate similarly to views in that they are associated
>>>>>> with tables, but they can accept arguments and produce return values, or
>>>>>> even function as inline expressions.
>>>>>> Many Query engines like Dremio, Trino, Snowflake, Databricks Spark
>>>>>> supports SQL UDFs at catalog level [1].
>>>>>> But storing them in Iceberg can enable
>>>>>> - Versioning of these UDFs.
>>>>>> - Interoperability between the engines. Potentially engines can
>>>>>> understand the UDFs written by other engines (with the translate layer).
>>>>>>
>>>>>> We believe that integrating this feature into Iceberg would be a
>>>>>> valuable addition, and we're eager to collaborate with the community to
>>>>>> develop a UDF specification.
>>>>>> Stephen <stephen....@dremio.com> has already begun drafting a
>>>>>> specification to propose to the community.
>>>>>>
>>>>>> Let us know your thoughts on this.
>>>>>>
>>>>>> [1]
>>>>>> Dremio -
>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function
>>>>>> Trino - https://trino.io/docs/current/sql/create-function.html
>>>>>> Snowflake -
>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions
>>>>>> Databricks -
>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html
>>>>>>
>>>>>> - Ajantha
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Tabular
>>>>>
>>>>> --
>>>>> Robert Stupp
>>>>> @snazy
>>>>>
>>>>>

Reply via email to