I think there is a disconnect about what is perceived as a "UDF". There are 2 flavors:
(1) Functions that are defined by the user whose definition is a composition of other built-in functions/SQL expressions. (2) Custom code written in imperative function according to a Java/Scala/Python API, etc. All the examples in Ajantha's references are pretty much from (1) and I think those have more analogy to views due to their SQL nature. Agree (2) is not practical to maintain by Iceberg, but I think Ajantha's use cases are around (1), and may be worth evaluating. Thanks, Walaa. On Tue, May 28, 2024 at 9:45 AM Ajantha Bhat <ajanthab...@gmail.com> wrote: > I guess we'll know more when you post the proposal, but I think this would >> be a very difficult area to tackle across engines, languages, and memory >> models without having a huge performance penalty. > > Assuming Iceberg initially supports SQL representations of UDFs (similar > to views as shared by the reference links above), the complexity involved > will be similar to managing views. > > Thanks, Ryan, Robert, and Jack, for your input. > We will work on publishing the draft spec (inspired by the view spec) this > week to facilitate further discussions. > > - Ajantha > > On Tue, May 28, 2024 at 7:33 PM Jack Ye <yezhao...@gmail.com> wrote: > >> > While it would be great to have a common set of functions across >> engines, I don't see how that is practical when those engines are >> implemented so differently. Plugging in code -- and especially custom >> user-supplied code -- seems inherently specialized to me and should be part >> of the engines' design. >> >> How is this different from the views? I feel we can say exactly the same >> thing for Iceberg views, but yet we have Iceberg multi-dialect views >> implemented. Maybe it sounds like we are trying to draw a line between SQL >> vs other programming language as "code"? but I think SQL is just another >> type of code, and we are already talking about compiling all these >> different code dialects to an intermediate representation (using projects >> like Coral, Substrait), which will be stored as another type of >> representation of Iceberg view. I think the same functionality can be used >> for UDFs if developed. >> >> I actually hink adding UDF support is a good idea, even just a >> multi-dialect one like view, and that can allow engines to for example >> parse a view SQL, and when a function referenced cannot be resolved, try to >> seek for a multi-dialect UDF definition. >> >> I guess we can discuss more when we have the actual proposal published. >> >> Best, >> Jack Ye >> >> >> >> >> On Tue, May 28, 2024 at 1:32 AM Robert Stupp <sn...@snazy.de> wrote: >> >>> UDFs are as engine specific and portable and "non-centralized" as views >>> are. The same performance concerns apply to views as well. >>> Iceberg should define a common base upon which engines can build, so the >>> argument that UDFs aren't practical, because engines are different, is >>> probably only a temporary concern. >>> >>> In the long term, Iceberg should also try to tackle the idea to make >>> views portable, which is conceptually not that much different from portable >>> UDFs. >>> >>> >>> PS: I'm not a fan of adding a negative touch to the idea of having UDFs >>> in Iceberg, especially not in this early stage. >>> >>> >>> On 24.05.24 20:53, Ryan Blue wrote: >>> >>> Thanks, Ajantha. >>> >>> I'm skeptical about whether it's a good idea to add UDFs tracked by >>> Iceberg catalogs. I think that Iceberg primarily deals with things that are >>> centralized, like tables of data. While it would be great to have a common >>> set of functions across engines, I don't see how that is practical when >>> those engines are implemented so differently. Plugging in code -- and >>> especially custom user-supplied code -- seems inherently specialized to me >>> and should be part of the engines' design. >>> >>> I guess we'll know more when you post the proposal, but I think this >>> would be a very difficult area to tackle across engines, languages, and >>> memory models without having a huge performance penalty. >>> >>> Ryan >>> >>> On Fri, May 24, 2024 at 8:10 AM Ajantha Bhat <ajanthab...@gmail.com> >>> wrote: >>> >>>> Hi Everyone, >>>> >>>> This is a discussion to gauge the community interest in storing the >>>> Versioned SQL UDFs in Iceberg. >>>> We want to propose the spec addition for storing the versioned UDFs in >>>> Iceberg (inspired by view spec). >>>> >>>> These UDFs can operate similarly to views in that they are associated >>>> with tables, but they can accept arguments and produce return values, or >>>> even function as inline expressions. >>>> Many Query engines like Dremio, Trino, Snowflake, Databricks Spark >>>> supports SQL UDFs at catalog level [1]. >>>> But storing them in Iceberg can enable >>>> - Versioning of these UDFs. >>>> - Interoperability between the engines. Potentially engines can >>>> understand the UDFs written by other engines (with the translate layer). >>>> >>>> We believe that integrating this feature into Iceberg would be a >>>> valuable addition, and we're eager to collaborate with the community to >>>> develop a UDF specification. >>>> Stephen <stephen....@dremio.com> has already begun drafting a >>>> specification to propose to the community. >>>> >>>> Let us know your thoughts on this. >>>> >>>> [1] >>>> Dremio - >>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function >>>> Trino - https://trino.io/docs/current/sql/create-function.html >>>> Snowflake - >>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions >>>> Databricks - >>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html >>>> >>>> - Ajantha >>>> >>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >>> -- >>> Robert Stupp >>> @snazy >>> >>>