> > I guess we'll know more when you post the proposal, but I think this would > be a very difficult area to tackle across engines, languages, and memory > models without having a huge performance penalty.
Assuming Iceberg initially supports SQL representations of UDFs (similar to views as shared by the reference links above), the complexity involved will be similar to managing views. Thanks, Ryan, Robert, and Jack, for your input. We will work on publishing the draft spec (inspired by the view spec) this week to facilitate further discussions. - Ajantha On Tue, May 28, 2024 at 7:33 PM Jack Ye <yezhao...@gmail.com> wrote: > > While it would be great to have a common set of functions across > engines, I don't see how that is practical when those engines are > implemented so differently. Plugging in code -- and especially custom > user-supplied code -- seems inherently specialized to me and should be part > of the engines' design. > > How is this different from the views? I feel we can say exactly the same > thing for Iceberg views, but yet we have Iceberg multi-dialect views > implemented. Maybe it sounds like we are trying to draw a line between SQL > vs other programming language as "code"? but I think SQL is just another > type of code, and we are already talking about compiling all these > different code dialects to an intermediate representation (using projects > like Coral, Substrait), which will be stored as another type of > representation of Iceberg view. I think the same functionality can be used > for UDFs if developed. > > I actually hink adding UDF support is a good idea, even just a > multi-dialect one like view, and that can allow engines to for example > parse a view SQL, and when a function referenced cannot be resolved, try to > seek for a multi-dialect UDF definition. > > I guess we can discuss more when we have the actual proposal published. > > Best, > Jack Ye > > > > > On Tue, May 28, 2024 at 1:32 AM Robert Stupp <sn...@snazy.de> wrote: > >> UDFs are as engine specific and portable and "non-centralized" as views >> are. The same performance concerns apply to views as well. >> Iceberg should define a common base upon which engines can build, so the >> argument that UDFs aren't practical, because engines are different, is >> probably only a temporary concern. >> >> In the long term, Iceberg should also try to tackle the idea to make >> views portable, which is conceptually not that much different from portable >> UDFs. >> >> >> PS: I'm not a fan of adding a negative touch to the idea of having UDFs >> in Iceberg, especially not in this early stage. >> >> >> On 24.05.24 20:53, Ryan Blue wrote: >> >> Thanks, Ajantha. >> >> I'm skeptical about whether it's a good idea to add UDFs tracked by >> Iceberg catalogs. I think that Iceberg primarily deals with things that are >> centralized, like tables of data. While it would be great to have a common >> set of functions across engines, I don't see how that is practical when >> those engines are implemented so differently. Plugging in code -- and >> especially custom user-supplied code -- seems inherently specialized to me >> and should be part of the engines' design. >> >> I guess we'll know more when you post the proposal, but I think this >> would be a very difficult area to tackle across engines, languages, and >> memory models without having a huge performance penalty. >> >> Ryan >> >> On Fri, May 24, 2024 at 8:10 AM Ajantha Bhat <ajanthab...@gmail.com> >> wrote: >> >>> Hi Everyone, >>> >>> This is a discussion to gauge the community interest in storing the >>> Versioned SQL UDFs in Iceberg. >>> We want to propose the spec addition for storing the versioned UDFs in >>> Iceberg (inspired by view spec). >>> >>> These UDFs can operate similarly to views in that they are associated >>> with tables, but they can accept arguments and produce return values, or >>> even function as inline expressions. >>> Many Query engines like Dremio, Trino, Snowflake, Databricks Spark >>> supports SQL UDFs at catalog level [1]. >>> But storing them in Iceberg can enable >>> - Versioning of these UDFs. >>> - Interoperability between the engines. Potentially engines can >>> understand the UDFs written by other engines (with the translate layer). >>> >>> We believe that integrating this feature into Iceberg would be a >>> valuable addition, and we're eager to collaborate with the community to >>> develop a UDF specification. >>> Stephen <stephen....@dremio.com> has already begun drafting a >>> specification to propose to the community. >>> >>> Let us know your thoughts on this. >>> >>> [1] >>> Dremio - >>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function >>> Trino - https://trino.io/docs/current/sql/create-function.html >>> Snowflake - >>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions >>> Databricks - >>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html >>> >>> - Ajantha >>> >> >> >> -- >> Ryan Blue >> Tabular >> >> -- >> Robert Stupp >> @snazy >> >>