Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-07-29 Thread Ajantha Bhat
Thanks to everyone who joined the sync. Here is the meeting recording: https://drive.google.com/file/d/1L5S6nb-C_pzBwFlClwO_sG1AVBA_ROKo/view Summary: We have discussed how to define function identifiers (should also handle function overloading). Ryan suggested that we should check how Spark does

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-07-27 Thread Kevin Liu
Hi Ajantha, I see that the UDF Sync is scheduled in the "Iceberg Dev Events" calendar for tomorrow 7/28 at 9AM PT. I missed the last one, but i'll be at this one. Best, Kevin Liu On Mon, Jul 14, 2025 at 9:22 AM Ajantha Bhat wrote: > Hey everyone, > > No one joined the sync today. I came to kno

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-07-14 Thread Ajantha Bhat
Hey everyone, No one joined the sync today. I came to know that Yufei is on holiday, and Ryan and others couldn't make it, similar to the last sync. It seems Yufei might have forgotten to transfer meeting ownership as well, as new members needed admin approval and couldn't join automatically this

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-07-02 Thread Yufei Gu
I’d propose to move the field `properties` from a top level field to a field inside “version” along with a representation, so that properties are versioned. A property like “deterministic” could change along with representation over time. For example, we need to change “deterministic” from true to

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-07-02 Thread Yufei Gu
Thanks for the summary, Ajantha! I’d prefer to keep the signature list separate from the representation history. Here are reasons: 1. Each version still enforces a single signature. Although the signatures array is global to the UDF, each version references just one signature ID. Rollbac

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-06-30 Thread Ajantha Bhat
Thanks to everyone who joined the sync. Here is the meeting recording: https://drive.google.com/file/d/1FcOSbHo9ZIVeZXdUlmoG42o-chB7Q15P/view?usp=sharing Summary: We have discussed the action items from the last sync (*see Appendix C* in the proposal doc) - Function overloading: Supported by f

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-06-30 Thread Ajantha Bhat
Can it be handled by Iceberg encryption? If the whole metadata is encrypted, we don't have to worry about just hiding the UDF body? Let us discuss more on the sync today. On Mon, Jun 30, 2025 at 9:22 PM Yufei Gu wrote: > Yes, hiding the definition and disabling pushdown are required.We will > ne

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-06-30 Thread Yufei Gu
Yes, hiding the definition and disabling pushdown are required.We will need a named key(e.g., secure) somewhere, no matter if it is a top level property or a key as a part of the UDF properties. So that both UDF creator and consumer can recognize it. Yufei On Thu, Jun 26, 2025 at 4:27 PM Ryan Bl

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-06-26 Thread Ryan Blue
Thanks for the extra detail. What do you think the spec would require? Would it require hiding the UDF definition from users and require specific pushdown cases be disabled? The use cases seem valid, but I'm trying to understand the requirements this places on engines and why it needs to be part of

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-06-20 Thread Yufei Gu
Hi Ryan, Here are the main use cases for secure UDFs: 1. Hiding UDF Definitions: This includes concealing the UDF body and details like the list of imports, some of them aren’t applicable to SQL UDFs. 2. Sandboxed Execution: Ensuring the UDF runs in an isolated environment.

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-06-18 Thread Ryan Blue
Yufei, could you make the argument for supporting a "secure" UDF? What use case are you addressing and what specifically changes about how the UDF is handled? If the idea is to hide the UDF definition, do we need to include it? I think this would be a signal to a "trusted engine". When the engine

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-06-16 Thread Yufei Gu
Thanks for the summary, Ajantha! Multi-statement UDFs are definitely useful, but whether those statements run within a single transaction should be treated as an engine-level concern. The Iceberg UDF spec can spell out the expectation, yet the actual guarantee still depends on the runtime. Even if

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-06-16 Thread Ajantha Bhat
Thanks to everyone who joined the sync. Here is the meeting recording: https://drive.google.com/file/d/10_Getaasv6tDMGzeZQUgcUVwCUAaFxiz/view?usp=sharing Summary: - We have gone through the SQL UDF syntax supported by different engines (Snowflake, databricks, Dremio, Trino, OSS spark 4.0).

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-06-04 Thread Ajantha Bhat
Thanks to everyone who joined the sync. Here is the meeting recording: https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing Summary: - We discussed including Python support; the majority agreed *not to* (see recording for details). - No strong opposi

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-05-20 Thread Yufei Gu
Hi folks, We’ve set up a dedicated bi-weekly community sync for the UDF project. Everyone’s welcome to drop in and share ideas! Here is the meeting link: Iceberg UDF sync Monday, June 2 · 9:00 – 10:00am Time zone: America/Los_Angeles Google Meet joining info Video call link: https://meet.google.c

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-05-16 Thread Ajantha Bhat
Update on the progress. I had a meeting today with Yufei and Yun.zou to discuss the UDF proposal. We covered several key points, though some are still open for further discussion: a) *UDF Versioning*: Do we truly need versioning for UDFs at this stage? We explored the possibility of simplifying t

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-03-12 Thread Ajantha Bhat
Hi everyone, I've updated the design document[1] based on the previous comments. Additionally, I've included the SQL UDF syntax supported by various vendors, including Dremio, Snowflake, Databricks, and Trino. I'm happy to schedule a separate sync if a deeper discussion is needed. Let's keep movi

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2025-02-13 Thread Ajantha Bhat
Hey everyone, During the last catalog community sync, there was significant interest in storing UDFs in Iceberg and adding endpoints for UDF handling in the REST catalog spec. I recently discussed this with Yufei to better understand the new requirement of using UDFs for fine-grained access contr

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-08-08 Thread Dmitri Bourlatchkov
The UDF spec does not require representations to be SQL. It merely does not specify (in this revision) how other representations are to be written. This seems like an easy extension (adding a new type in the "Representations" section). Cheers, Dmitri. On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue wr

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-08-08 Thread Ryan Blue
Right now, SQL is an explicit requirement of the spec. It leaves a way for future versions to add different representations later, but only SQL is supported. That was also the feedback to my initial skepticism about how it would work to add functions. On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlat

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-08-08 Thread Dmitri Bourlatchkov
I do not think the spec is meant to allow only SQL representations, although it is certainly faviouring SQL in examples... It would be nice to add a non-SQL example, indeed. Cheers, Dmitri. On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong wrote: > Coming from PyIceberg, I have concerns as this p

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-08-08 Thread Fokko Driesprong
Coming from PyIceberg, I have concerns as this proposal focuses on SQL-based engines, while Python-based systems often work with data frames. Adding imperative languages like Python would make this proposal more inclusive. Kind regards, Fokko Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen :

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-08-08 Thread Piotr Findeisen
Hi, Walaa, thanks for asking! In the design doc linked before in this thread [1] i read "Without a common standard, the UDFs are hard to share among different engines." ("Background and Motivation" section). I agree with this statement. I don't fully understand yet how the proposed design address

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-08-07 Thread Walaa Eldin Moustafa
Piotr, what do you mean by making user-created functions shareable between engines? Do you mean UDFs written in imperative code? On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen wrote: > > Hi, > > Thank you Ajantha for creating this thread. The Iceberg UDFs are an > interesting idea! > Is there a

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-08-07 Thread Piotr Findeisen
Hi, Thank you Ajantha for creating this thread. The Iceberg UDFs are an interesting idea! Is there a plan to make the user-created functions sharable between the engines? If so, how would a CREATE FUNCTION statement look like in e..g Spark or Trino? Meanwhile, added a few comments in the doc. Be

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-08-01 Thread Ryan Blue
I just looked through the proposal and added comments. I think it would be helpful to also have a design doc that covers the choices from the draft spec. For instance, the choice to enumerate all possible function input struts rather than allowing generics and varargs. Here’s a quick summary of my

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-08-01 Thread Ajantha Bhat
Thanks Walaa and Robert for the review on this. We didn't find any blocker for the spec. I will wait for a week and If no more review comments, I will raise a PR for spec addition next week. If anyone else is interested, please have a look at the proposal https://docs.google.com/document/d/1BDvOf

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-07-16 Thread Walaa Eldin Moustafa
Hi Ajantha, I have left some comments. It is an interesting direction, but there might be some details that need to be fine tuned. The doc is here [1] for others who might be interested. Resharing since I do not think it was directly linked in the thread. [1] https://docs.google.com/document/d/1

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-07-15 Thread Ajantha Bhat
Hi, just another reminder since we didn't get any review on the proposal. Initially proposed on June 4. - Ajantha On Mon, Jun 24, 2024 at 4:21 PM Ajantha Bhat wrote: > Hi everyone, > > We've only received one review so far (from Benny). > > We would appreciate more eyes on this. > > - Ajantha >

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-06-24 Thread Ajantha Bhat
Hi everyone, We've only received one review so far (from Benny). We would appreciate more eyes on this. - Ajantha On Tue, Jun 4, 2024 at 7:25 AM Ajantha Bhat wrote: > Hi All, > Please find the proposal link > https://github.com/apache/iceberg/issues/10432 > > Google doc link is attached in th

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-06-03 Thread Ajantha Bhat
Hi All, Please find the proposal link https://github.com/apache/iceberg/issues/10432 Google doc link is attached in the proposal. And Thanks Stephen Lin for working on it. Hope it gives more clarity to take the decisions and how we want to implement it. - Ajantha On W

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Walaa Eldin Moustafa
Thanks Jack. I actually meant scalar/aggregate/table user defined functions. Here are some examples of what I meant in (2): Hive GenericUDF: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java Trino user defined functions: https://trino.io/d

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Jack Ye
> (2) Custom code written in imperative function according to a Java/Scala/Python API, etc. I think we could still explore some long term opportunities in this case. Consider you register a Spark temp view as some sort of data frame read, then it could still be resolved to a Spark plan that is rep

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Benny Chow
It's interesting to note that a tabular SQL UDF can be used to build a *parameterized *view. So, there's definitely a lot in common between UDFs and views. Thanks On Tue, May 28, 2024 at 9:53 AM Walaa Eldin Moustafa wrote: > I think there is a disconnect about what is perceived as a "UDF". The

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Walaa Eldin Moustafa
I think there is a disconnect about what is perceived as a "UDF". There are 2 flavors: (1) Functions that are defined by the user whose definition is a composition of other built-in functions/SQL expressions. (2) Custom code written in imperative function according to a Java/Scala/Python API, etc.

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Ajantha Bhat
> > I guess we'll know more when you post the proposal, but I think this would > be a very difficult area to tackle across engines, languages, and memory > models without having a huge performance penalty. Assuming Iceberg initially supports SQL representations of UDFs (similar to views as shared

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Jack Ye
> While it would be great to have a common set of functions across engines, I don't see how that is practical when those engines are implemented so differently. Plugging in code -- and especially custom user-supplied code -- seems inherently specialized to me and should be part of the engines' desi

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Robert Stupp
UDFs are as engine specific and portable and "non-centralized" as views are. The same performance concerns apply to views as well. Iceberg should define a common base upon which engines can build, so the argument that UDFs aren't practical, because engines are different, is probably only a tem

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-24 Thread Ryan Blue
Thanks, Ajantha. I'm skeptical about whether it's a good idea to add UDFs tracked by Iceberg catalogs. I think that Iceberg primarily deals with things that are centralized, like tables of data. While it would be great to have a common set of functions across engines, I don't see how that is pract

[Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-24 Thread Ajantha Bhat
Hi Everyone, This is a discussion to gauge the community interest in storing the Versioned SQL UDFs in Iceberg. We want to propose the spec addition for storing the versioned UDFs in Iceberg (inspired by view spec). These UDFs can operate similarly to views in that they are associated with tables