It's interesting to note that a tabular SQL UDF can be used to build a
*parameterized
*view.  So, there's definitely a lot in common between UDFs and views.

Thanks

On Tue, May 28, 2024 at 9:53 AM Walaa Eldin Moustafa <wa.moust...@gmail.com>
wrote:

> I think there is a disconnect about what is perceived as a "UDF". There
> are 2 flavors:
>
> (1) Functions that are defined by the user whose definition is a
> composition of other built-in functions/SQL expressions.
> (2) Custom code written in imperative function according to a
> Java/Scala/Python API, etc.
>
> All the examples in Ajantha's references are pretty much from (1) and I
> think those have more analogy to views due to their SQL nature. Agree (2)
> is not practical to maintain by Iceberg, but I think Ajantha's use cases
> are around (1), and may be worth evaluating.
>
> Thanks,
> Walaa.
>
>
> On Tue, May 28, 2024 at 9:45 AM Ajantha Bhat <ajanthab...@gmail.com>
> wrote:
>
>> I guess we'll know more when you post the proposal, but I think this
>>> would be a very difficult area to tackle across engines, languages, and
>>> memory models without having a huge performance penalty.
>>
>> Assuming Iceberg initially supports SQL representations of UDFs (similar
>> to views as shared by the reference links above), the complexity involved
>> will be similar to managing views.
>>
>> Thanks, Ryan, Robert, and Jack, for your input.
>> We will work on publishing the draft spec (inspired by the view spec)
>> this week to facilitate further discussions.
>>
>> - Ajantha
>>
>> On Tue, May 28, 2024 at 7:33 PM Jack Ye <yezhao...@gmail.com> wrote:
>>
>>> > While it would be great to have a common set of functions across
>>> engines, I don't see how that is practical when those engines are
>>> implemented so differently. Plugging in code -- and especially custom
>>> user-supplied code -- seems inherently specialized to me and should be part
>>> of the engines' design.
>>>
>>> How is this different from the views? I feel we can say exactly the same
>>> thing for Iceberg views, but yet we have Iceberg multi-dialect views
>>> implemented. Maybe it sounds like we are trying to draw a line between SQL
>>> vs other programming language as "code"? but I think SQL is just another
>>> type of code, and we are already talking about compiling all these
>>> different code dialects to an intermediate representation (using projects
>>> like Coral, Substrait), which will be stored as another type of
>>> representation of Iceberg view. I think the same functionality can be used
>>> for UDFs if developed.
>>>
>>> I actually hink adding UDF support is a good idea, even just a
>>> multi-dialect one like view, and that can allow engines to for example
>>> parse a view SQL, and when a function referenced cannot be resolved, try to
>>> seek for a multi-dialect UDF definition.
>>>
>>> I guess we can discuss more when we have the actual proposal published.
>>>
>>> Best,
>>> Jack Ye
>>>
>>>
>>>
>>>
>>> On Tue, May 28, 2024 at 1:32 AM Robert Stupp <sn...@snazy.de> wrote:
>>>
>>>> UDFs are as engine specific and portable and "non-centralized" as views
>>>> are. The same performance concerns apply to views as well.
>>>> Iceberg should define a common base upon which engines can build, so
>>>> the argument that UDFs aren't practical, because engines are different, is
>>>> probably only a temporary concern.
>>>>
>>>> In the long term, Iceberg should also try to tackle the idea to make
>>>> views portable, which is conceptually not that much different from portable
>>>> UDFs.
>>>>
>>>>
>>>> PS: I'm not a fan of adding a negative touch to the idea of having UDFs
>>>> in Iceberg, especially not in this early stage.
>>>>
>>>>
>>>> On 24.05.24 20:53, Ryan Blue wrote:
>>>>
>>>> Thanks, Ajantha.
>>>>
>>>> I'm skeptical about whether it's a good idea to add UDFs tracked by
>>>> Iceberg catalogs. I think that Iceberg primarily deals with things that are
>>>> centralized, like tables of data. While it would be great to have a common
>>>> set of functions across engines, I don't see how that is practical when
>>>> those engines are implemented so differently. Plugging in code -- and
>>>> especially custom user-supplied code -- seems inherently specialized to me
>>>> and should be part of the engines' design.
>>>>
>>>> I guess we'll know more when you post the proposal, but I think this
>>>> would be a very difficult area to tackle across engines, languages, and
>>>> memory models without having a huge performance penalty.
>>>>
>>>> Ryan
>>>>
>>>> On Fri, May 24, 2024 at 8:10 AM Ajantha Bhat <ajanthab...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> This is a discussion to gauge the community interest in storing the
>>>>> Versioned SQL UDFs in Iceberg.
>>>>> We want to propose the spec addition for storing the versioned UDFs in
>>>>> Iceberg (inspired by view spec).
>>>>>
>>>>> These UDFs can operate similarly to views in that they are associated
>>>>> with tables, but they can accept arguments and produce return values, or
>>>>> even function as inline expressions.
>>>>> Many Query engines like Dremio, Trino, Snowflake, Databricks Spark
>>>>> supports SQL UDFs at catalog level [1].
>>>>> But storing them in Iceberg can enable
>>>>> - Versioning of these UDFs.
>>>>> - Interoperability between the engines. Potentially engines can
>>>>> understand the UDFs written by other engines (with the translate layer).
>>>>>
>>>>> We believe that integrating this feature into Iceberg would be a
>>>>> valuable addition, and we're eager to collaborate with the community to
>>>>> develop a UDF specification.
>>>>> Stephen <stephen....@dremio.com> has already begun drafting a
>>>>> specification to propose to the community.
>>>>>
>>>>> Let us know your thoughts on this.
>>>>>
>>>>> [1]
>>>>> Dremio -
>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function
>>>>> Trino - https://trino.io/docs/current/sql/create-function.html
>>>>> Snowflake -
>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions
>>>>> Databricks -
>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html
>>>>>
>>>>> - Ajantha
>>>>>
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Tabular
>>>>
>>>> --
>>>> Robert Stupp
>>>> @snazy
>>>>
>>>>

Reply via email to