Thanks to everyone who joined the sync. Here is the meeting recording: https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing
Summary: - We discussed including Python support; the majority agreed *not to* (see recording for details). - No strong opposition to versioning — it will be included to support change tracking and similar use cases. - Suggestions were made to document how each catalog resolves UDFs, similar to views and tables. - We agreed not to deviate from the existing table/view spec — e.g., location will remain *required* for cross-catalog compatibility. - We also discussed a bit about view interoperability as the same things are applicable here. Feel free to review the proposal document <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?pli=1&tab=t.0> here. With the current scope, it is similar to the view/table spec now. Final spec will be put to review and vote once it is ready. Details for next Iceberg UDF sync: *Monday, June 16 · 9:00 – 10:00am*Time zone: America/Los_Angeles Google Meet joining info Video call link: https://meet.google.com/aui-czix-nbh - Ajantha On Wed, May 21, 2025 at 3:33 AM Yufei Gu <flyrain...@gmail.com> wrote: > Hi folks, > > We’ve set up a dedicated bi-weekly community sync for the UDF project. > Everyone’s welcome to drop in and share ideas! Here is the meeting link: > > Iceberg UDF sync > Monday, June 2 · 9:00 – 10:00am > Time zone: America/Los_Angeles > Google Meet joining info > Video call link: https://meet.google.com/aui-czix-nbh > > Yufei > > > On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat <ajanthab...@gmail.com> > wrote: > >> Update on the progress. >> >> I had a meeting today with Yufei and Yun.zou to discuss the UDF proposal. >> We covered several key points, though some are still open for further >> discussion: >> >> a) *UDF Versioning*: Do we truly need versioning for UDFs at this stage? >> We explored the possibility of simplifying the specification by avoiding >> view replication, and potentially introducing versioning support later. >> UDTFs, being a superset of views in some ways, may not require versioning >> initially. >> >> b) *VarArgs Support*: While some query engines may not support vararg >> syntax in CREATE FUNCTION, Iceberg UDFs could represent such arguments >> as lists when supported by the engine. >> >> c) *Generics in UDFs*: Since Iceberg currently doesn’t support generic >> types (e.g., object), we can only map engine-specific types to Iceberg >> types. As a result, generic data types will not be supported in the initial >> version. >> >> d) *Python Support*: Incorporating Python as a language for SQL UDFs >> seems promising, especially given its potential to resolve interoperability >> challenges. Some engines, however, require platform version and package >> dependency details to execute Python code—this should be captured in the >> specification. >> >> *Next Steps* >> I will update the proposal document with two primary UDF use cases: >> >> - >> >> Policy exchange between engines >> - >> >> UDTF as a superset of view functionality >> >> The update will include corresponding syntax examples in both SQL and >> Python, and detail how each use case is represented in Iceberg metadata. >> >> We also plan to set up regular syncs (open to more interested >> participants) to continue refining and finalizing the UDF specification. >> - Ajantha >> >> >> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat <ajanthab...@gmail.com> >> wrote: >> >>> Hi everyone, >>> >>> I've updated the design document[1] based on the previous comments. >>> Additionally, I've included the SQL UDF syntax supported by various >>> vendors, including Dremio, Snowflake, Databricks, and Trino. >>> >>> I'm happy to schedule a separate sync if a deeper discussion is needed. >>> Let's keep moving forward, especially with the renewed interest from the >>> community. >>> >>> [1] >>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing >>> >>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat <ajanthab...@gmail.com> >>> wrote: >>> >>>> Hey everyone, >>>> >>>> During the last catalog community sync, there was significant interest >>>> in storing UDFs in Iceberg and adding endpoints for UDF handling in the >>>> REST catalog spec. >>>> >>>> I recently discussed this with Yufei to better understand the new >>>> requirement of using UDFs for fine-grained access control policies. This >>>> expands the use cases beyond just versioned and interoperable UDFs. >>>> Additionally, I learnt that many vendors are interested in this feature. >>>> >>>> Given the strong community interest and support, I’d like to take >>>> ownership of this effort and revive the work. I'll be revisiting the >>>> document I proposed long back and will share an updated proposal by next >>>> week. >>>> >>>> Looking forward to storing UDFs in Iceberg! >>>> - Ajantha >>>> >>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov >>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>> >>>>> The UDF spec does not require representations to be SQL. It merely >>>>> does not specify (in this revision) how other representations are to be >>>>> written. >>>>> >>>>> This seems like an easy extension (adding a new type in the >>>>> "Representations" section). >>>>> >>>>> Cheers, >>>>> Dmitri. >>>>> >>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue <b...@databricks.com.invalid> >>>>> wrote: >>>>> >>>>>> Right now, SQL is an explicit requirement of the spec. It leaves a >>>>>> way for future versions to add different representations later, but only >>>>>> SQL is supported. That was also the feedback to my initial skepticism >>>>>> about >>>>>> how it would work to add functions. >>>>>> >>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlatchkov >>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>> >>>>>>> I do not think the spec is meant to allow only SQL representations, >>>>>>> although it is certainly faviouring SQL in examples... It would be nice >>>>>>> to >>>>>>> add a non-SQL example, indeed. >>>>>>> >>>>>>> Cheers, >>>>>>> Dmitri. >>>>>>> >>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong <fo...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Coming from PyIceberg, I have concerns as this proposal focuses on >>>>>>>> SQL-based engines, while Python-based systems often work with data >>>>>>>> frames. >>>>>>>> Adding imperative languages like Python would make this proposal more >>>>>>>> inclusive. >>>>>>>> >>>>>>>> Kind regards, >>>>>>>> Fokko >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen < >>>>>>>> piotr.findei...@gmail.com>: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Walaa, thanks for asking! >>>>>>>>> In the design doc linked before in this thread [1] i read >>>>>>>>> "Without a common standard, the UDFs are hard to share among >>>>>>>>> different engines." >>>>>>>>> ("Background and Motivation" section). >>>>>>>>> I agree with this statement. I don't fully understand yet how the >>>>>>>>> proposed design addresses shareability between the engines though. >>>>>>>>> I would use some help to understand this better. >>>>>>>>> >>>>>>>>> Best >>>>>>>>> Piotr >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [1] SQL User-Defined Function Spec >>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc >>>>>>>>> >>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin Moustafa < >>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Piotr, what do you mean by making user-created functions shareable >>>>>>>>>> between engines? Do you mean UDFs written in imperative code? >>>>>>>>>> >>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen >>>>>>>>>> <piotr.findei...@gmail.com> wrote: >>>>>>>>>> > >>>>>>>>>> > Hi, >>>>>>>>>> > >>>>>>>>>> > Thank you Ajantha for creating this thread. The Iceberg UDFs >>>>>>>>>> are an interesting idea! >>>>>>>>>> > Is there a plan to make the user-created functions sharable >>>>>>>>>> between the engines? >>>>>>>>>> > If so, how would a CREATE FUNCTION statement look like in e..g >>>>>>>>>> Spark or Trino? >>>>>>>>>> > >>>>>>>>>> > Meanwhile, added a few comments in the doc. >>>>>>>>>> > >>>>>>>>>> > Best >>>>>>>>>> > Piotr >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue >>>>>>>>>> <b...@databricks.com.invalid> wrote: >>>>>>>>>> >> >>>>>>>>>> >> I just looked through the proposal and added comments. I think >>>>>>>>>> it would be helpful to also have a design doc that covers the >>>>>>>>>> choices from >>>>>>>>>> the draft spec. For instance, the choice to enumerate all possible >>>>>>>>>> function >>>>>>>>>> input struts rather than allowing generics and varargs. >>>>>>>>>> >> >>>>>>>>>> >> Here’s a quick summary of my feedback: >>>>>>>>>> >> >>>>>>>>>> >> I think that the choice to enumerate function signatures is >>>>>>>>>> limiting. It would be nice to see a discussion of the trade-offs and >>>>>>>>>> a >>>>>>>>>> rationale for the choice. I think it would also be very helpful to >>>>>>>>>> have a >>>>>>>>>> few representative use cases for this included in the doc. That way >>>>>>>>>> the >>>>>>>>>> proposal can demonstrate that it solves those use cases with >>>>>>>>>> reasonable >>>>>>>>>> trade-offs. >>>>>>>>>> >> There are a few instances where this is inconsistent with >>>>>>>>>> conventions in other specs. For example, using string IDs rather >>>>>>>>>> than an >>>>>>>>>> integer. >>>>>>>>>> >> This uses a very different model for spec versioning than the >>>>>>>>>> Iceberg view and table specs. It requires readers to fail if there >>>>>>>>>> are any >>>>>>>>>> unknown fields, which prevents the spec from adding things that are >>>>>>>>>> fully >>>>>>>>>> backward-compatible. Other Iceberg specs only require a version >>>>>>>>>> change to >>>>>>>>>> introduce forward-incompatible changes and I think that this should >>>>>>>>>> do the >>>>>>>>>> same to avoid confusion. >>>>>>>>>> >> It looks like the intent is to allow multiple function >>>>>>>>>> signatures per verison, but it is unclear how to encode them because >>>>>>>>>> a >>>>>>>>>> version is associated with a single function signature. >>>>>>>>>> >> There is no review of SQL syntax for creating functions across >>>>>>>>>> engines, so this doesn’t show that the metadata proposed is >>>>>>>>>> sufficient for >>>>>>>>>> cross-engine use cases. >>>>>>>>>> >> The example for a table-valued function shows a SELECT >>>>>>>>>> statement and it isn’t clear how this is distinct from a view >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat < >>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>> >>> >>>>>>>>>> >>> Thanks Walaa and Robert for the review on this. >>>>>>>>>> >>> >>>>>>>>>> >>> We didn't find any blocker for the spec. >>>>>>>>>> >>> I will wait for a week and If no more review comments, I will >>>>>>>>>> raise a PR for spec addition next week. >>>>>>>>>> >>> >>>>>>>>>> >>> If anyone else is interested, please have a look at the >>>>>>>>>> proposal >>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>> >>> >>>>>>>>>> >>> - Ajantha >>>>>>>>>> >>> >>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa Eldin Moustafa < >>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>> >>>> >>>>>>>>>> >>>> Hi Ajantha, >>>>>>>>>> >>>> >>>>>>>>>> >>>> I have left some comments. It is an interesting direction, >>>>>>>>>> but there might be some details that need to be fine tuned. >>>>>>>>>> >>>> >>>>>>>>>> >>>> The doc is here [1] for others who might be interested. >>>>>>>>>> Resharing since I do not think it was directly linked in the thread. >>>>>>>>>> >>>> >>>>>>>>>> >>>> [1] >>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>> >>>> >>>>>>>>>> >>>> Thanks, >>>>>>>>>> >>>> Walaa. >>>>>>>>>> >>>> >>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha Bhat < >>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> Hi, just another reminder since we didn't get any review on >>>>>>>>>> the proposal. >>>>>>>>>> >>>>> Initially proposed on June 4. >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> - Ajantha >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha Bhat < >>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> Hi everyone, >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> We've only received one review so far (from Benny). >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> We would appreciate more eyes on this. >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> - Ajantha >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha Bhat < >>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> Hi All, >>>>>>>>>> >>>>>>> Please find the proposal link >>>>>>>>>> >>>>>>> https://github.com/apache/iceberg/issues/10432 >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> Google doc link is attached in the proposal. >>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on it. >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> Hope it gives more clarity to take the decisions and how >>>>>>>>>> we want to implement it. >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> - Ajantha >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa Eldin Moustafa < >>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant scalar/aggregate/table >>>>>>>>>> user defined functions. Here are some examples of what I meant in >>>>>>>>>> (2): >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> Hive GenericUDF: >>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java >>>>>>>>>> >>>>>>>> Trino user defined functions: >>>>>>>>>> https://trino.io/docs/current/develop/functions.html >>>>>>>>>> >>>>>>>> Flink user defined functions: >>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/ >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> Probably what you referred to is a variation of (1) >>>>>>>>>> where the API is data flow/data pipeline API instead of SQL (e.g., >>>>>>>>>> Spark >>>>>>>>>> Scala). Yes, that is also possible in the very long run :) >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> Thanks, >>>>>>>>>> >>>>>>>> Walaa. >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye < >>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> > (2) Custom code written in imperative function >>>>>>>>>> according to a Java/Scala/Python API, etc. >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> I think we could still explore some long term >>>>>>>>>> opportunities in this case. Consider you register a Spark temp view >>>>>>>>>> as some >>>>>>>>>> sort of data frame read, then it could still be resolved to a Spark >>>>>>>>>> plan >>>>>>>>>> that is representable by an intermediate representation. But I agree >>>>>>>>>> this >>>>>>>>>> gets very complicated very soon, and just having the case (1) >>>>>>>>>> covered would >>>>>>>>>> already be a huge step forward. >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> -Jack >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM Benny Chow < >>>>>>>>>> btc...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It's interesting to note that a tabular SQL UDF can be >>>>>>>>>> used to build a parameterized view. So, there's definitely a lot in >>>>>>>>>> common >>>>>>>>>> between UDFs and views. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa Eldin Moustafa < >>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I think there is a disconnect about what is perceived >>>>>>>>>> as a "UDF". There are 2 flavors: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by the user whose >>>>>>>>>> definition is a composition of other built-in functions/SQL >>>>>>>>>> expressions. >>>>>>>>>> >>>>>>>>>>> (2) Custom code written in imperative function >>>>>>>>>> according to a Java/Scala/Python API, etc. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's references are pretty >>>>>>>>>> much from (1) and I think those have more analogy to views due to >>>>>>>>>> their SQL >>>>>>>>>> nature. Agree (2) is not practical to maintain by Iceberg, but I >>>>>>>>>> think >>>>>>>>>> Ajantha's use cases are around (1), and may be worth evaluating. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>>> Walaa. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM Ajantha Bhat < >>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you post the proposal, >>>>>>>>>> but I think this would be a very difficult area to tackle across >>>>>>>>>> engines, >>>>>>>>>> languages, and memory models without having a huge performance >>>>>>>>>> penalty. >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially supports SQL >>>>>>>>>> representations of UDFs (similar to views as shared by the reference >>>>>>>>>> links >>>>>>>>>> above), the complexity involved will be similar to managing views. >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack, for your input. >>>>>>>>>> >>>>>>>>>>>> We will work on publishing the draft spec (inspired >>>>>>>>>> by the view spec) this week to facilitate further discussions. >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> - Ajantha >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM Jack Ye < >>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> > While it would be great to have a common set of >>>>>>>>>> functions across engines, I don't see how that is practical when >>>>>>>>>> those >>>>>>>>>> engines are implemented so differently. Plugging in code -- and >>>>>>>>>> especially >>>>>>>>>> custom user-supplied code -- seems inherently specialized to me and >>>>>>>>>> should >>>>>>>>>> be part of the engines' design. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> How is this different from the views? I feel we can >>>>>>>>>> say exactly the same thing for Iceberg views, but yet we have Iceberg >>>>>>>>>> multi-dialect views implemented. Maybe it sounds like we are trying >>>>>>>>>> to draw >>>>>>>>>> a line between SQL vs other programming language as "code"? but I >>>>>>>>>> think SQL >>>>>>>>>> is just another type of code, and we are already talking about >>>>>>>>>> compiling >>>>>>>>>> all these different code dialects to an intermediate representation >>>>>>>>>> (using >>>>>>>>>> projects like Coral, Substrait), which will be stored as another >>>>>>>>>> type of >>>>>>>>>> representation of Iceberg view. I think the same functionality can >>>>>>>>>> be used >>>>>>>>>> for UDFs if developed. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF support is a good idea, >>>>>>>>>> even just a multi-dialect one like view, and that can allow engines >>>>>>>>>> to for >>>>>>>>>> example parse a view SQL, and when a function referenced cannot be >>>>>>>>>> resolved, try to seek for a multi-dialect UDF definition. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when we have the actual >>>>>>>>>> proposal published. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>> >>>>>>>>>>>>> Jack Ye >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM Robert Stupp < >>>>>>>>>> sn...@snazy.de> wrote: >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and portable and >>>>>>>>>> "non-centralized" as views are. The same performance concerns apply >>>>>>>>>> to >>>>>>>>>> views as well. >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common base upon which >>>>>>>>>> engines can build, so the argument that UDFs aren't practical, >>>>>>>>>> because >>>>>>>>>> engines are different, is probably only a temporary concern. >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg should also try to >>>>>>>>>> tackle the idea to make views portable, which is conceptually not >>>>>>>>>> that much >>>>>>>>>> different from portable UDFs. >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a negative touch to >>>>>>>>>> the idea of having UDFs in Iceberg, especially not in this early >>>>>>>>>> stage. >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue wrote: >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha. >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether it's a good idea to >>>>>>>>>> add UDFs tracked by Iceberg catalogs. I think that Iceberg primarily >>>>>>>>>> deals >>>>>>>>>> with things that are centralized, like tables of data. While it >>>>>>>>>> would be >>>>>>>>>> great to have a common set of functions across engines, I don't see >>>>>>>>>> how >>>>>>>>>> that is practical when those engines are implemented so differently. >>>>>>>>>> Plugging in code -- and especially custom user-supplied code -- seems >>>>>>>>>> inherently specialized to me and should be part of the engines' >>>>>>>>>> design. >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when you post the >>>>>>>>>> proposal, but I think this would be a very difficult area to tackle >>>>>>>>>> across >>>>>>>>>> engines, languages, and memory models without having a huge >>>>>>>>>> performance >>>>>>>>>> penalty. >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Ryan >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM Ajantha Bhat < >>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone, >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge the community >>>>>>>>>> interest in storing the Versioned SQL UDFs in Iceberg. >>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec addition for storing >>>>>>>>>> the versioned UDFs in Iceberg (inspired by view spec). >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate similarly to views in that >>>>>>>>>> they are associated with tables, but they can accept arguments and >>>>>>>>>> produce >>>>>>>>>> return values, or even function as inline expressions. >>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like Dremio, Trino, Snowflake, >>>>>>>>>> Databricks Spark supports SQL UDFs at catalog level [1]. >>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg can enable >>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs. >>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the engines. >>>>>>>>>> Potentially engines can understand the UDFs written by other engines >>>>>>>>>> (with >>>>>>>>>> the translate layer). >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating this feature into >>>>>>>>>> Iceberg would be a valuable addition, and we're eager to collaborate >>>>>>>>>> with >>>>>>>>>> the community to develop a UDF specification. >>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun drafting a >>>>>>>>>> specification to propose to the community. >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on this. >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>> >>>>>>>>>>>>>>> Dremio - >>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function >>>>>>>>>> >>>>>>>>>>>>>>> Trino - >>>>>>>>>> https://trino.io/docs/current/sql/create-function.html >>>>>>>>>> >>>>>>>>>>>>>>> Snowflake - >>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions >>>>>>>>>> >>>>>>>>>>>>>>> Databricks - >>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue >>>>>>>>>> >>>>>>>>>>>>>> Tabular >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp >>>>>>>>>> >>>>>>>>>>>>>> @snazy >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> -- >>>>>>>>>> >> Ryan Blue >>>>>>>>>> >> Databricks >>>>>>>>>> >>>>>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Databricks >>>>>> >>>>>