Thanks to everyone who joined the sync.
Here is the meeting recording:
https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing

Summary:

   -

   We discussed including Python support; the majority agreed *not to* (see
   recording for details).
   -

   No strong opposition to versioning — it will be included to support
   change tracking and similar use cases.
   -

   Suggestions were made to document how each catalog resolves UDFs,
   similar to views and tables.
   -

   We agreed not to deviate from the existing table/view spec — e.g.,
   location will remain *required* for cross-catalog compatibility.
   -

   We also discussed a bit about view interoperability as the same things
   are applicable here.

   Feel free to review the proposal document
   
<https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?pli=1&tab=t.0>
here.
   With the current scope, it is similar to the view/table spec now.
   Final spec will be put to review and vote once it is ready.

Details for next Iceberg UDF sync:

*Monday, June 16 · 9:00 – 10:00am*Time zone: America/Los_Angeles
Google Meet joining info
Video call link: https://meet.google.com/aui-czix-nbh

- Ajantha

On Wed, May 21, 2025 at 3:33 AM Yufei Gu <flyrain...@gmail.com> wrote:

> Hi folks,
>
> We’ve set up a dedicated bi-weekly community sync for the UDF project.
> Everyone’s welcome to drop in and share ideas! Here is the meeting link:
>
> Iceberg UDF sync
> Monday, June 2 · 9:00 – 10:00am
> Time zone: America/Los_Angeles
> Google Meet joining info
> Video call link: https://meet.google.com/aui-czix-nbh
>
> Yufei
>
>
> On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat <ajanthab...@gmail.com>
> wrote:
>
>> Update on the progress.
>>
>> I had a meeting today with Yufei and Yun.zou to discuss the UDF proposal.
>> We covered several key points, though some are still open for further
>> discussion:
>>
>> a) *UDF Versioning*: Do we truly need versioning for UDFs at this stage?
>> We explored the possibility of simplifying the specification by avoiding
>> view replication, and potentially introducing versioning support later.
>> UDTFs, being a superset of views in some ways, may not require versioning
>> initially.
>>
>> b) *VarArgs Support*: While some query engines may not support vararg
>> syntax in CREATE FUNCTION, Iceberg UDFs could represent such arguments
>> as lists when supported by the engine.
>>
>> c) *Generics in UDFs*: Since Iceberg currently doesn’t support generic
>> types (e.g., object), we can only map engine-specific types to Iceberg
>> types. As a result, generic data types will not be supported in the initial
>> version.
>>
>> d) *Python Support*: Incorporating Python as a language for SQL UDFs
>> seems promising, especially given its potential to resolve interoperability
>> challenges. Some engines, however, require platform version and package
>> dependency details to execute Python code—this should be captured in the
>> specification.
>>
>> *Next Steps*
>> I will update the proposal document with two primary UDF use cases:
>>
>>    -
>>
>>    Policy exchange between engines
>>    -
>>
>>    UDTF as a superset of view functionality
>>
>> The update will include corresponding syntax examples in both SQL and
>> Python, and detail how each use case is represented in Iceberg metadata.
>>
>> We also plan to set up regular syncs (open to more interested
>> participants) to continue refining and finalizing the UDF specification.
>> - Ajantha
>>
>>
>> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat <ajanthab...@gmail.com>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I've updated the design document[1] based on the previous comments.
>>> Additionally, I've included the SQL UDF syntax supported by various
>>> vendors, including Dremio, Snowflake, Databricks, and Trino.
>>>
>>> I'm happy to schedule a separate sync if a deeper discussion is needed.
>>> Let's keep moving forward, especially with the renewed interest from the
>>> community.
>>>
>>> [1]
>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing
>>>
>>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat <ajanthab...@gmail.com>
>>> wrote:
>>>
>>>> Hey everyone,
>>>>
>>>> During the last catalog community sync, there was significant interest
>>>> in storing UDFs in Iceberg and adding endpoints for UDF handling in the
>>>> REST catalog spec.
>>>>
>>>> I recently discussed this with Yufei to better understand the new
>>>> requirement of using UDFs for fine-grained access control policies. This
>>>> expands the use cases beyond just versioned and interoperable UDFs.
>>>> Additionally, I learnt that many vendors are interested in this feature.
>>>>
>>>> Given the strong community interest and support, I’d like to take
>>>> ownership of this effort and revive the work. I'll be revisiting the
>>>> document I proposed long back and will share an updated proposal by next
>>>> week.
>>>>
>>>> Looking forward to storing UDFs in Iceberg!
>>>> - Ajantha
>>>>
>>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov
>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>
>>>>> The UDF spec does not require representations to be SQL. It merely
>>>>> does not specify (in this revision) how other representations are to be
>>>>> written.
>>>>>
>>>>> This seems like an easy extension (adding a new type in the
>>>>> "Representations" section).
>>>>>
>>>>> Cheers,
>>>>> Dmitri.
>>>>>
>>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue <b...@databricks.com.invalid>
>>>>> wrote:
>>>>>
>>>>>> Right now, SQL is an explicit requirement of the spec. It leaves a
>>>>>> way for future versions to add different representations later, but only
>>>>>> SQL is supported. That was also the feedback to my initial skepticism 
>>>>>> about
>>>>>> how it would work to add functions.
>>>>>>
>>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlatchkov
>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>
>>>>>>> I do not think the spec is meant to allow only SQL representations,
>>>>>>> although it is certainly faviouring SQL in examples... It would be nice 
>>>>>>> to
>>>>>>> add a non-SQL example, indeed.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Dmitri.
>>>>>>>
>>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong <fo...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Coming from PyIceberg, I have concerns as this proposal focuses on
>>>>>>>> SQL-based engines, while Python-based systems often work with data 
>>>>>>>> frames.
>>>>>>>> Adding imperative languages like Python would make this proposal more
>>>>>>>> inclusive.
>>>>>>>>
>>>>>>>> Kind regards,
>>>>>>>> Fokko
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen <
>>>>>>>> piotr.findei...@gmail.com>:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Walaa, thanks for asking!
>>>>>>>>> In the design doc linked before  in this thread [1] i read
>>>>>>>>> "Without a common standard, the UDFs are hard to share among
>>>>>>>>> different engines."
>>>>>>>>> ("Background and Motivation" section).
>>>>>>>>> I agree with this statement. I don't fully understand yet how the
>>>>>>>>> proposed design addresses shareability between the engines though.
>>>>>>>>> I would use some help to understand this better.
>>>>>>>>>
>>>>>>>>> Best
>>>>>>>>> Piotr
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1] SQL User-Defined Function Spec
>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc
>>>>>>>>>
>>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin Moustafa <
>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Piotr, what do you mean by making user-created functions shareable
>>>>>>>>>> between engines? Do you mean UDFs written in imperative code?
>>>>>>>>>>
>>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen
>>>>>>>>>> <piotr.findei...@gmail.com> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Hi,
>>>>>>>>>> >
>>>>>>>>>> > Thank you Ajantha for creating this thread. The Iceberg UDFs
>>>>>>>>>> are an interesting idea!
>>>>>>>>>> > Is there a plan to make the user-created functions sharable
>>>>>>>>>> between the engines?
>>>>>>>>>> > If so, how would a CREATE FUNCTION statement look like in e..g
>>>>>>>>>> Spark or Trino?
>>>>>>>>>> >
>>>>>>>>>> > Meanwhile, added a few comments in the doc.
>>>>>>>>>> >
>>>>>>>>>> > Best
>>>>>>>>>> > Piotr
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue
>>>>>>>>>> <b...@databricks.com.invalid> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> I just looked through the proposal and added comments. I think
>>>>>>>>>> it would be helpful to also have a design doc that covers the 
>>>>>>>>>> choices from
>>>>>>>>>> the draft spec. For instance, the choice to enumerate all possible 
>>>>>>>>>> function
>>>>>>>>>> input struts rather than allowing generics and varargs.
>>>>>>>>>> >>
>>>>>>>>>> >> Here’s a quick summary of my feedback:
>>>>>>>>>> >>
>>>>>>>>>> >> I think that the choice to enumerate function signatures is
>>>>>>>>>> limiting. It would be nice to see a discussion of the trade-offs and 
>>>>>>>>>> a
>>>>>>>>>> rationale for the choice. I think it would also be very helpful to 
>>>>>>>>>> have a
>>>>>>>>>> few representative use cases for this included in the doc. That way 
>>>>>>>>>> the
>>>>>>>>>> proposal can demonstrate that it solves those use cases with 
>>>>>>>>>> reasonable
>>>>>>>>>> trade-offs.
>>>>>>>>>> >> There are a few instances where this is inconsistent with
>>>>>>>>>> conventions in other specs. For example, using string IDs rather 
>>>>>>>>>> than an
>>>>>>>>>> integer.
>>>>>>>>>> >> This uses a very different model for spec versioning than the
>>>>>>>>>> Iceberg view and table specs. It requires readers to fail if there 
>>>>>>>>>> are any
>>>>>>>>>> unknown fields, which prevents the spec from adding things that are 
>>>>>>>>>> fully
>>>>>>>>>> backward-compatible. Other Iceberg specs only require a version 
>>>>>>>>>> change to
>>>>>>>>>> introduce forward-incompatible changes and I think that this should 
>>>>>>>>>> do the
>>>>>>>>>> same to avoid confusion.
>>>>>>>>>> >> It looks like the intent is to allow multiple function
>>>>>>>>>> signatures per verison, but it is unclear how to encode them because 
>>>>>>>>>> a
>>>>>>>>>> version is associated with a single function signature.
>>>>>>>>>> >> There is no review of SQL syntax for creating functions across
>>>>>>>>>> engines, so this doesn’t show that the metadata proposed is 
>>>>>>>>>> sufficient for
>>>>>>>>>> cross-engine use cases.
>>>>>>>>>> >> The example for a table-valued function shows a SELECT
>>>>>>>>>> statement and it isn’t clear how this is distinct from a view
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat <
>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>> Thanks Walaa and Robert for the review on this.
>>>>>>>>>> >>>
>>>>>>>>>> >>> We didn't find any blocker for the spec.
>>>>>>>>>> >>> I will wait for a week and If no more review comments, I will
>>>>>>>>>> raise a PR for spec addition next week.
>>>>>>>>>> >>>
>>>>>>>>>> >>> If anyone else is interested, please have a look at the
>>>>>>>>>> proposal
>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>>> >>>
>>>>>>>>>> >>> - Ajantha
>>>>>>>>>> >>>
>>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa Eldin Moustafa <
>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> Hi Ajantha,
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> I have left some comments. It is an interesting direction,
>>>>>>>>>> but there might be some details that need to be fine tuned.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> The doc is here [1] for others who might be interested.
>>>>>>>>>> Resharing since I do not think it was directly linked in the thread.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> [1]
>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> Thanks,
>>>>>>>>>> >>>> Walaa.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha Bhat <
>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Hi, just another reminder since we didn't get any review on
>>>>>>>>>> the proposal.
>>>>>>>>>> >>>>> Initially proposed on June 4.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> - Ajantha
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha Bhat <
>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> Hi everyone,
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> We've only received one review so far (from Benny).
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> We would appreciate more eyes on this.
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> - Ajantha
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha Bhat <
>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> Hi All,
>>>>>>>>>> >>>>>>> Please find the proposal link
>>>>>>>>>> >>>>>>> https://github.com/apache/iceberg/issues/10432
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> Google doc link is attached in the proposal.
>>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on it.
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> Hope it gives more clarity to take the decisions and how
>>>>>>>>>> we want to implement it.
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> - Ajantha
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa Eldin Moustafa <
>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant scalar/aggregate/table
>>>>>>>>>> user defined functions. Here are some examples of what I meant in 
>>>>>>>>>> (2):
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> Hive GenericUDF:
>>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java
>>>>>>>>>> >>>>>>>> Trino user defined functions:
>>>>>>>>>> https://trino.io/docs/current/develop/functions.html
>>>>>>>>>> >>>>>>>> Flink user defined functions:
>>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> Probably what you referred to is a variation of (1)
>>>>>>>>>> where the API is data flow/data pipeline API instead of SQL (e.g., 
>>>>>>>>>> Spark
>>>>>>>>>> Scala). Yes, that is also possible in the very long run :)
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> Thanks,
>>>>>>>>>> >>>>>>>> Walaa.
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye <
>>>>>>>>>> yezhao...@gmail.com> wrote:
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>> > (2) Custom code written in imperative function
>>>>>>>>>> according to a Java/Scala/Python API, etc.
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>> I think we could still explore some long term
>>>>>>>>>> opportunities in this case. Consider you register a Spark temp view 
>>>>>>>>>> as some
>>>>>>>>>> sort of data frame read, then it could still be resolved to a Spark 
>>>>>>>>>> plan
>>>>>>>>>> that is representable by an intermediate representation. But I agree 
>>>>>>>>>> this
>>>>>>>>>> gets very complicated very soon, and just having the case (1) 
>>>>>>>>>> covered would
>>>>>>>>>> already be a huge step forward.
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>> -Jack
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM Benny Chow <
>>>>>>>>>> btc...@gmail.com> wrote:
>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>> >>>>>>>>>> It's interesting to note that a tabular SQL UDF can be
>>>>>>>>>> used to build a parameterized view.  So, there's definitely a lot in 
>>>>>>>>>> common
>>>>>>>>>> between UDFs and views.
>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>> >>>>>>>>>> Thanks
>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa Eldin Moustafa <
>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>> I think there is a disconnect about what is perceived
>>>>>>>>>> as a "UDF". There are 2 flavors:
>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by the user whose
>>>>>>>>>> definition is a composition of other built-in functions/SQL 
>>>>>>>>>> expressions.
>>>>>>>>>> >>>>>>>>>>> (2) Custom code written in imperative function
>>>>>>>>>> according to a Java/Scala/Python API, etc.
>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's references are pretty
>>>>>>>>>> much from (1) and I think those have more analogy to views due to 
>>>>>>>>>> their SQL
>>>>>>>>>> nature. Agree (2) is not practical to maintain by Iceberg, but I 
>>>>>>>>>> think
>>>>>>>>>> Ajantha's use cases are around (1), and may be worth evaluating.
>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>> Thanks,
>>>>>>>>>> >>>>>>>>>>> Walaa.
>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM Ajantha Bhat <
>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you post the proposal,
>>>>>>>>>> but I think this would be a very difficult area to tackle across 
>>>>>>>>>> engines,
>>>>>>>>>> languages, and memory models without having a huge performance 
>>>>>>>>>> penalty.
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially supports SQL
>>>>>>>>>> representations of UDFs (similar to views as shared by the reference 
>>>>>>>>>> links
>>>>>>>>>> above), the complexity involved will be similar to managing views.
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack, for your input.
>>>>>>>>>> >>>>>>>>>>>> We will work on publishing the draft spec (inspired
>>>>>>>>>> by the view spec) this week to facilitate further discussions.
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>> - Ajantha
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM Jack Ye <
>>>>>>>>>> yezhao...@gmail.com> wrote:
>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>> > While it would be great to have a common set of
>>>>>>>>>> functions across engines, I don't see how that is practical when 
>>>>>>>>>> those
>>>>>>>>>> engines are implemented so differently. Plugging in code -- and 
>>>>>>>>>> especially
>>>>>>>>>> custom user-supplied code -- seems inherently specialized to me and 
>>>>>>>>>> should
>>>>>>>>>> be part of the engines' design.
>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>> How is this different from the views? I feel we can
>>>>>>>>>> say exactly the same thing for Iceberg views, but yet we have Iceberg
>>>>>>>>>> multi-dialect views implemented. Maybe it sounds like we are trying 
>>>>>>>>>> to draw
>>>>>>>>>> a line between SQL vs other programming language as "code"? but I 
>>>>>>>>>> think SQL
>>>>>>>>>> is just another type of code, and we are already talking about 
>>>>>>>>>> compiling
>>>>>>>>>> all these different code dialects to an intermediate representation 
>>>>>>>>>> (using
>>>>>>>>>> projects like Coral, Substrait), which will be stored as another 
>>>>>>>>>> type of
>>>>>>>>>> representation of Iceberg view. I think the same functionality can 
>>>>>>>>>> be used
>>>>>>>>>> for UDFs if developed.
>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF support is a good idea,
>>>>>>>>>> even just a multi-dialect one like view, and that can allow engines 
>>>>>>>>>> to for
>>>>>>>>>> example parse a view SQL, and when a function referenced cannot be
>>>>>>>>>> resolved, try to seek for a multi-dialect UDF definition.
>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when we have the actual
>>>>>>>>>> proposal published.
>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>> Best,
>>>>>>>>>> >>>>>>>>>>>>> Jack Ye
>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM Robert Stupp <
>>>>>>>>>> sn...@snazy.de> wrote:
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and portable and
>>>>>>>>>> "non-centralized" as views are. The same performance concerns apply 
>>>>>>>>>> to
>>>>>>>>>> views as well.
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common base upon which
>>>>>>>>>> engines can build, so the argument that UDFs aren't practical, 
>>>>>>>>>> because
>>>>>>>>>> engines are different, is probably only a temporary concern.
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg should also try to
>>>>>>>>>> tackle the idea to make views portable, which is conceptually not 
>>>>>>>>>> that much
>>>>>>>>>> different from portable UDFs.
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a negative touch to
>>>>>>>>>> the idea of having UDFs in Iceberg, especially not in this early 
>>>>>>>>>> stage.
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue wrote:
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha.
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether it's a good idea to
>>>>>>>>>> add UDFs tracked by Iceberg catalogs. I think that Iceberg primarily 
>>>>>>>>>> deals
>>>>>>>>>> with things that are centralized, like tables of data. While it 
>>>>>>>>>> would be
>>>>>>>>>> great to have a common set of functions across engines, I don't see 
>>>>>>>>>> how
>>>>>>>>>> that is practical when those engines are implemented so differently.
>>>>>>>>>> Plugging in code -- and especially custom user-supplied code -- seems
>>>>>>>>>> inherently specialized to me and should be part of the engines' 
>>>>>>>>>> design.
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when you post the
>>>>>>>>>> proposal, but I think this would be a very difficult area to tackle 
>>>>>>>>>> across
>>>>>>>>>> engines, languages, and memory models without having a huge 
>>>>>>>>>> performance
>>>>>>>>>> penalty.
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>> Ryan
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM Ajantha Bhat <
>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone,
>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge the community
>>>>>>>>>> interest in storing the Versioned SQL UDFs in Iceberg.
>>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec addition for storing
>>>>>>>>>> the versioned UDFs in Iceberg (inspired by view spec).
>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate similarly to views in that
>>>>>>>>>> they are associated with tables, but they can accept arguments and 
>>>>>>>>>> produce
>>>>>>>>>> return values, or even function as inline expressions.
>>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like Dremio, Trino, Snowflake,
>>>>>>>>>> Databricks Spark supports SQL UDFs at catalog level [1].
>>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg can enable
>>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs.
>>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the engines.
>>>>>>>>>> Potentially engines can understand the UDFs written by other engines 
>>>>>>>>>> (with
>>>>>>>>>> the translate layer).
>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating this feature into
>>>>>>>>>> Iceberg would be a valuable addition, and we're eager to collaborate 
>>>>>>>>>> with
>>>>>>>>>> the community to develop a UDF specification.
>>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun drafting a
>>>>>>>>>> specification to propose to the community.
>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on this.
>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>>> [1]
>>>>>>>>>> >>>>>>>>>>>>>>> Dremio -
>>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function
>>>>>>>>>> >>>>>>>>>>>>>>> Trino -
>>>>>>>>>> https://trino.io/docs/current/sql/create-function.html
>>>>>>>>>> >>>>>>>>>>>>>>> Snowflake -
>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions
>>>>>>>>>> >>>>>>>>>>>>>>> Databricks -
>>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html
>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>> >>>>>>>>>>>>>> Tabular
>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp
>>>>>>>>>> >>>>>>>>>>>>>> @snazy
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> --
>>>>>>>>>> >> Ryan Blue
>>>>>>>>>> >> Databricks
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Databricks
>>>>>>
>>>>>

Reply via email to