It's interesting to note that a tabular SQL UDF can be used to build a *parameterized *view. So, there's definitely a lot in common between UDFs and views.
Thanks On Tue, May 28, 2024 at 9:53 AM Walaa Eldin Moustafa <wa.moust...@gmail.com> wrote: > I think there is a disconnect about what is perceived as a "UDF". There > are 2 flavors: > > (1) Functions that are defined by the user whose definition is a > composition of other built-in functions/SQL expressions. > (2) Custom code written in imperative function according to a > Java/Scala/Python API, etc. > > All the examples in Ajantha's references are pretty much from (1) and I > think those have more analogy to views due to their SQL nature. Agree (2) > is not practical to maintain by Iceberg, but I think Ajantha's use cases > are around (1), and may be worth evaluating. > > Thanks, > Walaa. > > > On Tue, May 28, 2024 at 9:45 AM Ajantha Bhat <ajanthab...@gmail.com> > wrote: > >> I guess we'll know more when you post the proposal, but I think this >>> would be a very difficult area to tackle across engines, languages, and >>> memory models without having a huge performance penalty. >> >> Assuming Iceberg initially supports SQL representations of UDFs (similar >> to views as shared by the reference links above), the complexity involved >> will be similar to managing views. >> >> Thanks, Ryan, Robert, and Jack, for your input. >> We will work on publishing the draft spec (inspired by the view spec) >> this week to facilitate further discussions. >> >> - Ajantha >> >> On Tue, May 28, 2024 at 7:33 PM Jack Ye <yezhao...@gmail.com> wrote: >> >>> > While it would be great to have a common set of functions across >>> engines, I don't see how that is practical when those engines are >>> implemented so differently. Plugging in code -- and especially custom >>> user-supplied code -- seems inherently specialized to me and should be part >>> of the engines' design. >>> >>> How is this different from the views? I feel we can say exactly the same >>> thing for Iceberg views, but yet we have Iceberg multi-dialect views >>> implemented. Maybe it sounds like we are trying to draw a line between SQL >>> vs other programming language as "code"? but I think SQL is just another >>> type of code, and we are already talking about compiling all these >>> different code dialects to an intermediate representation (using projects >>> like Coral, Substrait), which will be stored as another type of >>> representation of Iceberg view. I think the same functionality can be used >>> for UDFs if developed. >>> >>> I actually hink adding UDF support is a good idea, even just a >>> multi-dialect one like view, and that can allow engines to for example >>> parse a view SQL, and when a function referenced cannot be resolved, try to >>> seek for a multi-dialect UDF definition. >>> >>> I guess we can discuss more when we have the actual proposal published. >>> >>> Best, >>> Jack Ye >>> >>> >>> >>> >>> On Tue, May 28, 2024 at 1:32 AM Robert Stupp <sn...@snazy.de> wrote: >>> >>>> UDFs are as engine specific and portable and "non-centralized" as views >>>> are. The same performance concerns apply to views as well. >>>> Iceberg should define a common base upon which engines can build, so >>>> the argument that UDFs aren't practical, because engines are different, is >>>> probably only a temporary concern. >>>> >>>> In the long term, Iceberg should also try to tackle the idea to make >>>> views portable, which is conceptually not that much different from portable >>>> UDFs. >>>> >>>> >>>> PS: I'm not a fan of adding a negative touch to the idea of having UDFs >>>> in Iceberg, especially not in this early stage. >>>> >>>> >>>> On 24.05.24 20:53, Ryan Blue wrote: >>>> >>>> Thanks, Ajantha. >>>> >>>> I'm skeptical about whether it's a good idea to add UDFs tracked by >>>> Iceberg catalogs. I think that Iceberg primarily deals with things that are >>>> centralized, like tables of data. While it would be great to have a common >>>> set of functions across engines, I don't see how that is practical when >>>> those engines are implemented so differently. Plugging in code -- and >>>> especially custom user-supplied code -- seems inherently specialized to me >>>> and should be part of the engines' design. >>>> >>>> I guess we'll know more when you post the proposal, but I think this >>>> would be a very difficult area to tackle across engines, languages, and >>>> memory models without having a huge performance penalty. >>>> >>>> Ryan >>>> >>>> On Fri, May 24, 2024 at 8:10 AM Ajantha Bhat <ajanthab...@gmail.com> >>>> wrote: >>>> >>>>> Hi Everyone, >>>>> >>>>> This is a discussion to gauge the community interest in storing the >>>>> Versioned SQL UDFs in Iceberg. >>>>> We want to propose the spec addition for storing the versioned UDFs in >>>>> Iceberg (inspired by view spec). >>>>> >>>>> These UDFs can operate similarly to views in that they are associated >>>>> with tables, but they can accept arguments and produce return values, or >>>>> even function as inline expressions. >>>>> Many Query engines like Dremio, Trino, Snowflake, Databricks Spark >>>>> supports SQL UDFs at catalog level [1]. >>>>> But storing them in Iceberg can enable >>>>> - Versioning of these UDFs. >>>>> - Interoperability between the engines. Potentially engines can >>>>> understand the UDFs written by other engines (with the translate layer). >>>>> >>>>> We believe that integrating this feature into Iceberg would be a >>>>> valuable addition, and we're eager to collaborate with the community to >>>>> develop a UDF specification. >>>>> Stephen <stephen....@dremio.com> has already begun drafting a >>>>> specification to propose to the community. >>>>> >>>>> Let us know your thoughts on this. >>>>> >>>>> [1] >>>>> Dremio - >>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function >>>>> Trino - https://trino.io/docs/current/sql/create-function.html >>>>> Snowflake - >>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions >>>>> Databricks - >>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html >>>>> >>>>> - Ajantha >>>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Tabular >>>> >>>> -- >>>> Robert Stupp >>>> @snazy >>>> >>>>