> (2) Custom code written in imperative function according to a Java/Scala/Python API, etc.
I think we could still explore some long term opportunities in this case. Consider you register a Spark temp view as some sort of data frame read, then it could still be resolved to a Spark plan that is representable by an intermediate representation. But I agree this gets very complicated very soon, and just having the case (1) covered would already be a huge step forward. -Jack On Tue, May 28, 2024 at 1:40 PM Benny Chow <btc...@gmail.com> wrote: > It's interesting to note that a tabular SQL UDF can be used to build a > *parameterized > *view. So, there's definitely a lot in common between UDFs and views. > > Thanks > > On Tue, May 28, 2024 at 9:53 AM Walaa Eldin Moustafa < > wa.moust...@gmail.com> wrote: > >> I think there is a disconnect about what is perceived as a "UDF". There >> are 2 flavors: >> >> (1) Functions that are defined by the user whose definition is a >> composition of other built-in functions/SQL expressions. >> (2) Custom code written in imperative function according to a >> Java/Scala/Python API, etc. >> >> All the examples in Ajantha's references are pretty much from (1) and I >> think those have more analogy to views due to their SQL nature. Agree (2) >> is not practical to maintain by Iceberg, but I think Ajantha's use cases >> are around (1), and may be worth evaluating. >> >> Thanks, >> Walaa. >> >> >> On Tue, May 28, 2024 at 9:45 AM Ajantha Bhat <ajanthab...@gmail.com> >> wrote: >> >>> I guess we'll know more when you post the proposal, but I think this >>>> would be a very difficult area to tackle across engines, languages, and >>>> memory models without having a huge performance penalty. >>> >>> Assuming Iceberg initially supports SQL representations of UDFs (similar >>> to views as shared by the reference links above), the complexity involved >>> will be similar to managing views. >>> >>> Thanks, Ryan, Robert, and Jack, for your input. >>> We will work on publishing the draft spec (inspired by the view spec) >>> this week to facilitate further discussions. >>> >>> - Ajantha >>> >>> On Tue, May 28, 2024 at 7:33 PM Jack Ye <yezhao...@gmail.com> wrote: >>> >>>> > While it would be great to have a common set of functions across >>>> engines, I don't see how that is practical when those engines are >>>> implemented so differently. Plugging in code -- and especially custom >>>> user-supplied code -- seems inherently specialized to me and should be part >>>> of the engines' design. >>>> >>>> How is this different from the views? I feel we can say exactly the >>>> same thing for Iceberg views, but yet we have Iceberg multi-dialect views >>>> implemented. Maybe it sounds like we are trying to draw a line between SQL >>>> vs other programming language as "code"? but I think SQL is just another >>>> type of code, and we are already talking about compiling all these >>>> different code dialects to an intermediate representation (using projects >>>> like Coral, Substrait), which will be stored as another type of >>>> representation of Iceberg view. I think the same functionality can be used >>>> for UDFs if developed. >>>> >>>> I actually hink adding UDF support is a good idea, even just a >>>> multi-dialect one like view, and that can allow engines to for example >>>> parse a view SQL, and when a function referenced cannot be resolved, try to >>>> seek for a multi-dialect UDF definition. >>>> >>>> I guess we can discuss more when we have the actual proposal published. >>>> >>>> Best, >>>> Jack Ye >>>> >>>> >>>> >>>> >>>> On Tue, May 28, 2024 at 1:32 AM Robert Stupp <sn...@snazy.de> wrote: >>>> >>>>> UDFs are as engine specific and portable and "non-centralized" as >>>>> views are. The same performance concerns apply to views as well. >>>>> Iceberg should define a common base upon which engines can build, so >>>>> the argument that UDFs aren't practical, because engines are different, is >>>>> probably only a temporary concern. >>>>> >>>>> In the long term, Iceberg should also try to tackle the idea to make >>>>> views portable, which is conceptually not that much different from >>>>> portable >>>>> UDFs. >>>>> >>>>> >>>>> PS: I'm not a fan of adding a negative touch to the idea of having >>>>> UDFs in Iceberg, especially not in this early stage. >>>>> >>>>> >>>>> On 24.05.24 20:53, Ryan Blue wrote: >>>>> >>>>> Thanks, Ajantha. >>>>> >>>>> I'm skeptical about whether it's a good idea to add UDFs tracked by >>>>> Iceberg catalogs. I think that Iceberg primarily deals with things that >>>>> are >>>>> centralized, like tables of data. While it would be great to have a common >>>>> set of functions across engines, I don't see how that is practical when >>>>> those engines are implemented so differently. Plugging in code -- and >>>>> especially custom user-supplied code -- seems inherently specialized to me >>>>> and should be part of the engines' design. >>>>> >>>>> I guess we'll know more when you post the proposal, but I think this >>>>> would be a very difficult area to tackle across engines, languages, and >>>>> memory models without having a huge performance penalty. >>>>> >>>>> Ryan >>>>> >>>>> On Fri, May 24, 2024 at 8:10 AM Ajantha Bhat <ajanthab...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Everyone, >>>>>> >>>>>> This is a discussion to gauge the community interest in storing the >>>>>> Versioned SQL UDFs in Iceberg. >>>>>> We want to propose the spec addition for storing the versioned UDFs >>>>>> in Iceberg (inspired by view spec). >>>>>> >>>>>> These UDFs can operate similarly to views in that they are associated >>>>>> with tables, but they can accept arguments and produce return values, or >>>>>> even function as inline expressions. >>>>>> Many Query engines like Dremio, Trino, Snowflake, Databricks Spark >>>>>> supports SQL UDFs at catalog level [1]. >>>>>> But storing them in Iceberg can enable >>>>>> - Versioning of these UDFs. >>>>>> - Interoperability between the engines. Potentially engines can >>>>>> understand the UDFs written by other engines (with the translate layer). >>>>>> >>>>>> We believe that integrating this feature into Iceberg would be a >>>>>> valuable addition, and we're eager to collaborate with the community to >>>>>> develop a UDF specification. >>>>>> Stephen <stephen....@dremio.com> has already begun drafting a >>>>>> specification to propose to the community. >>>>>> >>>>>> Let us know your thoughts on this. >>>>>> >>>>>> [1] >>>>>> Dremio - >>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function >>>>>> Trino - https://trino.io/docs/current/sql/create-function.html >>>>>> Snowflake - >>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions >>>>>> Databricks - >>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html >>>>>> >>>>>> - Ajantha >>>>>> >>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Tabular >>>>> >>>>> -- >>>>> Robert Stupp >>>>> @snazy >>>>> >>>>>