Hi Ajantha,

I see that the UDF Sync is scheduled in the "Iceberg Dev Events" calendar for
tomorrow 7/28 at 9AM PT. I missed the last one, but i'll be at this one.

Best,
Kevin Liu

On Mon, Jul 14, 2025 at 9:22 AM Ajantha Bhat <ajanthab...@gmail.com> wrote:

> Hey everyone,
>
> No one joined the sync today. I came to know that Yufei is on holiday, and
> Ryan and others couldn't make it, similar to the last sync. It seems Yufei
> might have forgotten to transfer meeting ownership as well, as new members
> needed admin approval and couldn't join automatically this week. Also, I
> can understand it is summer holiday season for many.
>
> I've updated the function signature schema and other open points. I
> believe we're very close to the final version of the spec. A meeting is
> indeed necessary to finalize this, but we don't have to wait for it to
> finish the review process. We had many meetings on this in the past
> already. So, please review the document at your earliest convenience. If we
> agree on the spec by next week, I can raise a PR.
>
> - Ajantha
>
> On Thu, Jul 3, 2025 at 4:03 AM Yufei Gu <flyrain...@gmail.com> wrote:
>
>> I’d propose to move the field `properties` from a top level field to a
>> field inside “version” along with a representation, so that properties are
>> versioned. A property like “deterministic” could change along with
>> representation over time. For example, we need to change “deterministic”
>> from true to false in case of adding a non-deterministic SQL
>> expression/function(e.g., now()) inside an UDF. Otherwise, rollback won't
>> be safe.
>>
>> That said, it's still an open question whether we need any non-versioned
>> properties. We can introduce them later if a use case arises.
>>
>> Yufei
>>
>>
>> On Wed, Jul 2, 2025 at 3:06 PM Yufei Gu <flyrain...@gmail.com> wrote:
>>
>>> Thanks for the summary, Ajantha!
>>>
>>> I’d prefer to keep the signature list separate from the representation
>>> history. Here are reasons:
>>>
>>>    1. Each version still enforces a single signature. Although the
>>>    signatures array is global to the UDF, each version references just one
>>>    signature ID. Rollbacks to historical versions remain safe.
>>>    2. We’ve separated the less frequently changing component
>>>    (signatures) from the more dynamic one (representations) to reduce 
>>> metadata
>>>    file size.
>>>    3. Since signatures use Iceberg data types, they should remain
>>>    unaffected by multi-dialect representation differences.
>>>
>>> Yufei
>>>
>>>
>>> On Mon, Jun 30, 2025 at 11:28 AM Ajantha Bhat <ajanthab...@gmail.com>
>>> wrote:
>>>
>>>> Thanks to everyone who joined the sync.
>>>> Here is the meeting recording:
>>>> https://drive.google.com/file/d/1FcOSbHo9ZIVeZXdUlmoG42o-chB7Q15P/view?usp=sharing
>>>>
>>>> Summary:
>>>> We have discussed the action items from the last sync (*see Appendix C* in
>>>> the proposal doc)
>>>>
>>>>    - Function overloading: Supported by few of the engines and in the
>>>>    roadmaps of many engines. Iceberg will support it. We will maintain the
>>>>    `FunctionIdentifier` (extends `TableIdentifer` but also have a member
>>>>    containing the function argument's type list). And all operations like
>>>>    load, rename, list, create and drop are based on `FunctionIdentifier`.
>>>>    - Secure UDF: If we store it as a property in a bag, we need to
>>>>    standardize the property name. Iceberg encryption may be orthogonal to 
>>>> this
>>>>    discussion.
>>>>    - UDF with multi statement and procedural bodies are supported by
>>>>    some engines. Iceberg will support it. Store the body as it is while
>>>>    creating function by the engine.
>>>>
>>>> new discussions around
>>>>
>>>>    - Standardizing the property names (deterministic, secure).
>>>>    - About the rename function.
>>>>    - Replace function. To check upto what level replace is supported
>>>>    (considering function overloading) .
>>>>    - Signature should be associated with representation?
>>>>
>>>>    I think we are close on the spec. Please review the proposal
>>>>    
>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>
>>>>    .
>>>>
>>>> Details for next Iceberg UDF sync:
>>>>
>>>> *Monday, July 14 · 9:00 – 10:00am*Time zone: America/Los_Angeles
>>>> Google Meet joining info
>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>
>>>> - Ajantha
>>>>
>>>> On Mon, Jun 30, 2025 at 9:27 PM Ajantha Bhat <ajanthab...@gmail.com>
>>>> wrote:
>>>>
>>>>> Can it be handled by Iceberg encryption? If the whole metadata is
>>>>> encrypted, we don't have to worry about just hiding the UDF body? Let us
>>>>> discuss more on the sync today.
>>>>>
>>>>> On Mon, Jun 30, 2025 at 9:22 PM Yufei Gu <flyrain...@gmail.com> wrote:
>>>>>
>>>>>> Yes, hiding the definition and disabling pushdown are required.We
>>>>>> will need a named key(e.g., secure) somewhere, no matter if it is a top
>>>>>> level property or a key as a part of the UDF properties. So that both UDF
>>>>>> creator and consumer can recognize it.
>>>>>>
>>>>>> Yufei
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 26, 2025 at 4:27 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks for the extra detail. What do you think the spec would
>>>>>>> require? Would it require hiding the UDF definition from users and 
>>>>>>> require
>>>>>>> specific pushdown cases be disabled? The use cases seem valid, but I'm
>>>>>>> trying to understand the requirements this places on engines and why it
>>>>>>> needs to be part of the spec, rather than part of the properties of the 
>>>>>>> UDF.
>>>>>>>
>>>>>>> On Fri, Jun 20, 2025 at 3:56 PM Yufei Gu <flyrain...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Ryan,
>>>>>>>>
>>>>>>>> Here are the main use cases for secure UDFs:
>>>>>>>>
>>>>>>>>    1.
>>>>>>>>
>>>>>>>>    Hiding UDF Definitions: This includes concealing the UDF body
>>>>>>>>    and details like the list of imports, some of them aren’t 
>>>>>>>> applicable to SQL
>>>>>>>>    UDFs.
>>>>>>>>    2.
>>>>>>>>
>>>>>>>>    Sandboxed Execution: Ensuring the UDF runs in an isolated
>>>>>>>>    environment. Again, this typically doesn’t apply to SQL UDFs.
>>>>>>>>    3.
>>>>>>>>
>>>>>>>>    Preventing Data Leakage at Execution Time: For example, secure
>>>>>>>>    UDFs may disable certain optimizations—such as predicate 
>>>>>>>> pushdown—to avoid
>>>>>>>>    exposing sensitive data indirectly. [1]
>>>>>>>>
>>>>>>>> Given these scenarios, I agree with your point that the secure
>>>>>>>> flag is primarily an instruction to the engine to behave differently. 
>>>>>>>> While
>>>>>>>> it's largely an engine-side behavior, we still need to include this 
>>>>>>>> flag in
>>>>>>>> the UDF definition to indicate whether a UDF is secure, especially
>>>>>>>> considering the perf penalty introduced by scenario #3. We should 
>>>>>>>> clearly
>>>>>>>> recommend that users avoid marking UDFs as secure unless it's truly
>>>>>>>> necessary.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://docs.snowflake.com/en/developer-guide/pushdown-optimization#example-of-indirect-data-exposure-through-pushdown
>>>>>>>> Yufei
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 18, 2025 at 12:32 PM Ryan Blue <rdb...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Yufei, could you make the argument for supporting a "secure" UDF?
>>>>>>>>> What use case are you addressing and what specifically changes about 
>>>>>>>>> how
>>>>>>>>> the UDF is handled? If the idea is to hide the UDF definition, do we 
>>>>>>>>> need
>>>>>>>>> to include it?
>>>>>>>>>
>>>>>>>>> I think this would be a signal to a "trusted engine". When the
>>>>>>>>> engine interacts with the catalog it sends authorization information 
>>>>>>>>> about
>>>>>>>>> itself in addition to the user that it is acting on behalf of. That 
>>>>>>>>> way the
>>>>>>>>> catalog knows that the secure UDF can be sent to the engine and won't 
>>>>>>>>> be
>>>>>>>>> shown to the user. The majority of this logic is on the REST server 
>>>>>>>>> side,
>>>>>>>>> and the only part that is communicated to the client is the request 
>>>>>>>>> not to
>>>>>>>>> show the UDF to the user, right? In that case should this be a 
>>>>>>>>> property
>>>>>>>>> rather than part of the definition? Even if we state that the client 
>>>>>>>>> "must"
>>>>>>>>> suppress the UDF definition, it's really just a request. Only trusted
>>>>>>>>> engines can be passed the UDF definition, so a spec requirement to 
>>>>>>>>> suppress
>>>>>>>>> the definition isn't very meaningful.
>>>>>>>>>
>>>>>>>>> On Mon, Jun 16, 2025 at 5:42 PM Yufei Gu <flyrain...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for the summary, Ajantha!
>>>>>>>>>>
>>>>>>>>>> Multi-statement UDFs are definitely useful, but whether those
>>>>>>>>>> statements run within a single transaction should be treated as an
>>>>>>>>>> engine-level concern. The Iceberg UDF spec can spell out the 
>>>>>>>>>> expectation,
>>>>>>>>>> yet the actual guarantee still depends on the runtime. Even if a UDF
>>>>>>>>>> declares itself transactional, the engine may or may not enforce it.
>>>>>>>>>>
>>>>>>>>>> One more thing: should we also introduce a “secure UDF” option
>>>>>>>>>> supported by some engines[1], so the body and any sensitive details 
>>>>>>>>>> stay
>>>>>>>>>> hidden from callers?
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/secure-udf-procedure
>>>>>>>>>>
>>>>>>>>>> Yufei
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 16, 2025 at 12:02 PM Ajantha Bhat <
>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>>>> Here is the meeting recording:
>>>>>>>>>>> https://drive.google.com/file/d/10_Getaasv6tDMGzeZQUgcUVwCUAaFxiz/view?usp=sharing
>>>>>>>>>>> Summary:
>>>>>>>>>>>
>>>>>>>>>>>    - We have gone through the SQL UDF syntax supported by
>>>>>>>>>>>    different engines (Snowflake, databricks, Dremio, Trino, OSS 
>>>>>>>>>>> spark 4.0).
>>>>>>>>>>>    - Each engine uses its own block separator, like $$ or '' or
>>>>>>>>>>>    none. Action item was to check whether engines support 
>>>>>>>>>>> multi-statement
>>>>>>>>>>>    (transactional) UDF bodies.
>>>>>>>>>>>    - Discussed about function overloading. Need to check
>>>>>>>>>>>    whether these engines support function overloading for SQL UDFs. 
>>>>>>>>>>> Postgres
>>>>>>>>>>>    supports it! If yes, need to adopt the spec to handle it.
>>>>>>>>>>>    - Started online spec review and discussed the deterministic
>>>>>>>>>>>    flag and concluded that we keep the independent fields (like 
>>>>>>>>>>> deterministic)
>>>>>>>>>>>    in spec only if the majority of engines supports it. Else it 
>>>>>>>>>>> will be passed
>>>>>>>>>>>    in a property bag (engine specific). And it is the engine's
>>>>>>>>>>>    responsibility to honor those optional properties.
>>>>>>>>>>>
>>>>>>>>>>> Feel free to review the current proposal document here
>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>.
>>>>>>>>>>>
>>>>>>>>>>> Final spec will be put to review and vote once it is ready.
>>>>>>>>>>>
>>>>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>>>>
>>>>>>>>>>> *Monday, June 30 · 9:00 – 10:00am*Time zone: America/Los_Angeles
>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>
>>>>>>>>>>> - Ajantha
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 4, 2025 at 9:00 PM Ajantha Bhat <
>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>>>>> Here is the meeting recording:
>>>>>>>>>>>> https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing
>>>>>>>>>>>>
>>>>>>>>>>>> Summary:
>>>>>>>>>>>>
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>    We discussed including Python support; the majority agreed *not
>>>>>>>>>>>>    to* (see recording for details).
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>    No strong opposition to versioning — it will be included to
>>>>>>>>>>>>    support change tracking and similar use cases.
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>    Suggestions were made to document how each catalog resolves
>>>>>>>>>>>>    UDFs, similar to views and tables.
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>    We agreed not to deviate from the existing table/view spec
>>>>>>>>>>>>    — e.g., location will remain *required* for cross-catalog
>>>>>>>>>>>>    compatibility.
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>    We also discussed a bit about view interoperability as the
>>>>>>>>>>>>    same things are applicable here.
>>>>>>>>>>>>
>>>>>>>>>>>>    Feel free to review the proposal document
>>>>>>>>>>>>    
>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?pli=1&tab=t.0>
>>>>>>>>>>>>  here.
>>>>>>>>>>>>    With the current scope, it is similar to the view/table spec 
>>>>>>>>>>>> now.
>>>>>>>>>>>>    Final spec will be put to review and vote once it is ready.
>>>>>>>>>>>>
>>>>>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>>>>>
>>>>>>>>>>>> *Monday, June 16 · 9:00 – 10:00am*Time zone:
>>>>>>>>>>>> America/Los_Angeles
>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>>
>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, May 21, 2025 at 3:33 AM Yufei Gu <flyrain...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> We’ve set up a dedicated bi-weekly community sync for the UDF
>>>>>>>>>>>>> project. Everyone’s welcome to drop in and share ideas! Here is 
>>>>>>>>>>>>> the meeting
>>>>>>>>>>>>> link:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Iceberg UDF sync
>>>>>>>>>>>>> Monday, June 2 · 9:00 – 10:00am
>>>>>>>>>>>>> Time zone: America/Los_Angeles
>>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat <
>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Update on the progress.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I had a meeting today with Yufei and Yun.zou to discuss the
>>>>>>>>>>>>>> UDF proposal. We covered several key points, though some are 
>>>>>>>>>>>>>> still open for
>>>>>>>>>>>>>> further discussion:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> a) *UDF Versioning*: Do we truly need versioning for UDFs at
>>>>>>>>>>>>>> this stage? We explored the possibility of simplifying the 
>>>>>>>>>>>>>> specification by
>>>>>>>>>>>>>> avoiding view replication, and potentially introducing 
>>>>>>>>>>>>>> versioning support
>>>>>>>>>>>>>> later. UDTFs, being a superset of views in some ways, may not 
>>>>>>>>>>>>>> require
>>>>>>>>>>>>>> versioning initially.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> b) *VarArgs Support*: While some query engines may not
>>>>>>>>>>>>>> support vararg syntax in CREATE FUNCTION, Iceberg UDFs could
>>>>>>>>>>>>>> represent such arguments as lists when supported by the engine.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> c) *Generics in UDFs*: Since Iceberg currently doesn’t
>>>>>>>>>>>>>> support generic types (e.g., object), we can only map
>>>>>>>>>>>>>> engine-specific types to Iceberg types. As a result, generic 
>>>>>>>>>>>>>> data types
>>>>>>>>>>>>>> will not be supported in the initial version.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> d) *Python Support*: Incorporating Python as a language for
>>>>>>>>>>>>>> SQL UDFs seems promising, especially given its potential to 
>>>>>>>>>>>>>> resolve
>>>>>>>>>>>>>> interoperability challenges. Some engines, however, require 
>>>>>>>>>>>>>> platform
>>>>>>>>>>>>>> version and package dependency details to execute Python 
>>>>>>>>>>>>>> code—this should
>>>>>>>>>>>>>> be captured in the specification.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Next Steps*
>>>>>>>>>>>>>> I will update the proposal document with two primary UDF use
>>>>>>>>>>>>>> cases:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Policy exchange between engines
>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    UDTF as a superset of view functionality
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The update will include corresponding syntax examples in both
>>>>>>>>>>>>>> SQL and Python, and detail how each use case is represented in 
>>>>>>>>>>>>>> Iceberg
>>>>>>>>>>>>>> metadata.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We also plan to set up regular syncs (open to more interested
>>>>>>>>>>>>>> participants) to continue refining and finalizing the UDF 
>>>>>>>>>>>>>> specification.
>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat <
>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I've updated the design document[1] based on the previous
>>>>>>>>>>>>>>> comments. Additionally, I've included the SQL UDF syntax 
>>>>>>>>>>>>>>> supported by
>>>>>>>>>>>>>>> various vendors, including Dremio, Snowflake, Databricks, and 
>>>>>>>>>>>>>>> Trino.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm happy to schedule a separate sync if a deeper discussion
>>>>>>>>>>>>>>> is needed. Let's keep moving forward, especially with the 
>>>>>>>>>>>>>>> renewed interest
>>>>>>>>>>>>>>> from the community.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat <
>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> During the last catalog community sync, there was
>>>>>>>>>>>>>>>> significant interest in storing UDFs in Iceberg and adding 
>>>>>>>>>>>>>>>> endpoints for
>>>>>>>>>>>>>>>> UDF handling in the REST catalog spec.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I recently discussed this with Yufei to better understand
>>>>>>>>>>>>>>>> the new requirement of using UDFs for fine-grained access 
>>>>>>>>>>>>>>>> control policies.
>>>>>>>>>>>>>>>> This expands the use cases beyond just versioned and 
>>>>>>>>>>>>>>>> interoperable UDFs.
>>>>>>>>>>>>>>>> Additionally, I learnt that many vendors are interested in 
>>>>>>>>>>>>>>>> this feature.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Given the strong community interest and support, I’d like
>>>>>>>>>>>>>>>> to take ownership of this effort and revive the work. I'll be 
>>>>>>>>>>>>>>>> revisiting
>>>>>>>>>>>>>>>> the document I proposed long back and will share an updated 
>>>>>>>>>>>>>>>> proposal by
>>>>>>>>>>>>>>>> next week.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Looking forward to storing UDFs in Iceberg!
>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov
>>>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The UDF spec does not require representations to be SQL.
>>>>>>>>>>>>>>>>> It merely does not specify (in this revision) how other 
>>>>>>>>>>>>>>>>> representations are
>>>>>>>>>>>>>>>>> to be written.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This seems like an easy extension (adding a new type in
>>>>>>>>>>>>>>>>> the "Representations" section).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue
>>>>>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Right now, SQL is an explicit requirement of the spec. It
>>>>>>>>>>>>>>>>>> leaves a way for future versions to add different 
>>>>>>>>>>>>>>>>>> representations later,
>>>>>>>>>>>>>>>>>> but only SQL is supported. That was also the feedback to my 
>>>>>>>>>>>>>>>>>> initial
>>>>>>>>>>>>>>>>>> skepticism about how it would work to add functions.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlatchkov
>>>>>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I do not think the spec is meant to allow only SQL
>>>>>>>>>>>>>>>>>>> representations, although it is certainly faviouring SQL in 
>>>>>>>>>>>>>>>>>>> examples... It
>>>>>>>>>>>>>>>>>>> would be nice to add a non-SQL example, indeed.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong <
>>>>>>>>>>>>>>>>>>> fo...@apache.org> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Coming from PyIceberg, I have concerns as this proposal
>>>>>>>>>>>>>>>>>>>> focuses on SQL-based engines, while Python-based systems 
>>>>>>>>>>>>>>>>>>>> often work with
>>>>>>>>>>>>>>>>>>>> data frames. Adding imperative languages like Python would 
>>>>>>>>>>>>>>>>>>>> make this
>>>>>>>>>>>>>>>>>>>> proposal more inclusive.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>>>> Fokko
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen <
>>>>>>>>>>>>>>>>>>>> piotr.findei...@gmail.com>:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Walaa, thanks for asking!
>>>>>>>>>>>>>>>>>>>>> In the design doc linked before  in this thread [1] i
>>>>>>>>>>>>>>>>>>>>> read
>>>>>>>>>>>>>>>>>>>>> "Without a common standard, the UDFs are hard to share
>>>>>>>>>>>>>>>>>>>>> among different engines."
>>>>>>>>>>>>>>>>>>>>> ("Background and Motivation" section).
>>>>>>>>>>>>>>>>>>>>> I agree with this statement. I don't fully understand
>>>>>>>>>>>>>>>>>>>>> yet how the proposed design addresses shareability 
>>>>>>>>>>>>>>>>>>>>> between the engines
>>>>>>>>>>>>>>>>>>>>> though.
>>>>>>>>>>>>>>>>>>>>> I would use some help to understand this better.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>>>>>>>> Piotr
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> [1] SQL User-Defined Function Spec
>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin Moustafa <
>>>>>>>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Piotr, what do you mean by making user-created
>>>>>>>>>>>>>>>>>>>>>> functions shareable
>>>>>>>>>>>>>>>>>>>>>> between engines? Do you mean UDFs written in
>>>>>>>>>>>>>>>>>>>>>> imperative code?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen
>>>>>>>>>>>>>>>>>>>>>> <piotr.findei...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Hi,
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Thank you Ajantha for creating this thread. The
>>>>>>>>>>>>>>>>>>>>>> Iceberg UDFs are an interesting idea!
>>>>>>>>>>>>>>>>>>>>>> > Is there a plan to make the user-created functions
>>>>>>>>>>>>>>>>>>>>>> sharable between the engines?
>>>>>>>>>>>>>>>>>>>>>> > If so, how would a CREATE FUNCTION statement look
>>>>>>>>>>>>>>>>>>>>>> like in e..g Spark or Trino?
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Meanwhile, added a few comments in the doc.
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Best
>>>>>>>>>>>>>>>>>>>>>> > Piotr
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue
>>>>>>>>>>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote:
>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>> >> I just looked through the proposal and added
>>>>>>>>>>>>>>>>>>>>>> comments. I think it would be helpful to also have a 
>>>>>>>>>>>>>>>>>>>>>> design doc that covers
>>>>>>>>>>>>>>>>>>>>>> the choices from the draft spec. For instance, the 
>>>>>>>>>>>>>>>>>>>>>> choice to enumerate all
>>>>>>>>>>>>>>>>>>>>>> possible function input struts rather than allowing 
>>>>>>>>>>>>>>>>>>>>>> generics and varargs.
>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>> >> Here’s a quick summary of my feedback:
>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>> >> I think that the choice to enumerate function
>>>>>>>>>>>>>>>>>>>>>> signatures is limiting. It would be nice to see a 
>>>>>>>>>>>>>>>>>>>>>> discussion of the
>>>>>>>>>>>>>>>>>>>>>> trade-offs and a rationale for the choice. I think it 
>>>>>>>>>>>>>>>>>>>>>> would also be very
>>>>>>>>>>>>>>>>>>>>>> helpful to have a few representative use cases for this 
>>>>>>>>>>>>>>>>>>>>>> included in the
>>>>>>>>>>>>>>>>>>>>>> doc. That way the proposal can demonstrate that it 
>>>>>>>>>>>>>>>>>>>>>> solves those use cases
>>>>>>>>>>>>>>>>>>>>>> with reasonable trade-offs.
>>>>>>>>>>>>>>>>>>>>>> >> There are a few instances where this is
>>>>>>>>>>>>>>>>>>>>>> inconsistent with conventions in other specs. For 
>>>>>>>>>>>>>>>>>>>>>> example, using string IDs
>>>>>>>>>>>>>>>>>>>>>> rather than an integer.
>>>>>>>>>>>>>>>>>>>>>> >> This uses a very different model for spec
>>>>>>>>>>>>>>>>>>>>>> versioning than the Iceberg view and table specs. It 
>>>>>>>>>>>>>>>>>>>>>> requires readers to
>>>>>>>>>>>>>>>>>>>>>> fail if there are any unknown fields, which prevents the 
>>>>>>>>>>>>>>>>>>>>>> spec from adding
>>>>>>>>>>>>>>>>>>>>>> things that are fully backward-compatible. Other Iceberg 
>>>>>>>>>>>>>>>>>>>>>> specs only require
>>>>>>>>>>>>>>>>>>>>>> a version change to introduce forward-incompatible 
>>>>>>>>>>>>>>>>>>>>>> changes and I think that
>>>>>>>>>>>>>>>>>>>>>> this should do the same to avoid confusion.
>>>>>>>>>>>>>>>>>>>>>> >> It looks like the intent is to allow multiple
>>>>>>>>>>>>>>>>>>>>>> function signatures per verison, but it is unclear how 
>>>>>>>>>>>>>>>>>>>>>> to encode them
>>>>>>>>>>>>>>>>>>>>>> because a version is associated with a single function 
>>>>>>>>>>>>>>>>>>>>>> signature.
>>>>>>>>>>>>>>>>>>>>>> >> There is no review of SQL syntax for creating
>>>>>>>>>>>>>>>>>>>>>> functions across engines, so this doesn’t show that the 
>>>>>>>>>>>>>>>>>>>>>> metadata proposed
>>>>>>>>>>>>>>>>>>>>>> is sufficient for cross-engine use cases.
>>>>>>>>>>>>>>>>>>>>>> >> The example for a table-valued function shows a
>>>>>>>>>>>>>>>>>>>>>> SELECT statement and it isn’t clear how this is distinct 
>>>>>>>>>>>>>>>>>>>>>> from a view
>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>> >>> Thanks Walaa and Robert for the review on this.
>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>> >>> We didn't find any blocker for the spec.
>>>>>>>>>>>>>>>>>>>>>> >>> I will wait for a week and If no more review
>>>>>>>>>>>>>>>>>>>>>> comments, I will raise a PR for spec addition next week.
>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>> >>> If anyone else is interested, please have a look
>>>>>>>>>>>>>>>>>>>>>> at the proposal
>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>> >>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa Eldin
>>>>>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>> >>>> Hi Ajantha,
>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>> >>>> I have left some comments. It is an interesting
>>>>>>>>>>>>>>>>>>>>>> direction, but there might be some details that need to 
>>>>>>>>>>>>>>>>>>>>>> be fine tuned.
>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>> >>>> The doc is here [1] for others who might be
>>>>>>>>>>>>>>>>>>>>>> interested. Resharing since I do not think it was 
>>>>>>>>>>>>>>>>>>>>>> directly linked in the
>>>>>>>>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>> >>>> [1]
>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> >>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>> Hi, just another reminder since we didn't get
>>>>>>>>>>>>>>>>>>>>>> any review on the proposal.
>>>>>>>>>>>>>>>>>>>>>> >>>>> Initially proposed on June 4.
>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>> We've only received one review so far (from
>>>>>>>>>>>>>>>>>>>>>> Benny).
>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>> We would appreciate more eyes on this.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Please find the proposal link
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10432
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Google doc link is attached in the proposal.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on it.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hope it gives more clarity to take the
>>>>>>>>>>>>>>>>>>>>>> decisions and how we want to implement it.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa Eldin
>>>>>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant
>>>>>>>>>>>>>>>>>>>>>> scalar/aggregate/table user defined functions. Here are 
>>>>>>>>>>>>>>>>>>>>>> some examples of
>>>>>>>>>>>>>>>>>>>>>> what I meant in (2):
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hive GenericUDF:
>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Trino user defined functions:
>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/develop/functions.html
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Flink user defined functions:
>>>>>>>>>>>>>>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Probably what you referred to is a variation
>>>>>>>>>>>>>>>>>>>>>> of (1) where the API is data flow/data pipeline API 
>>>>>>>>>>>>>>>>>>>>>> instead of SQL (e.g.,
>>>>>>>>>>>>>>>>>>>>>> Spark Scala). Yes, that is also possible in the very 
>>>>>>>>>>>>>>>>>>>>>> long run :)
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye <
>>>>>>>>>>>>>>>>>>>>>> yezhao...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> > (2) Custom code written in imperative
>>>>>>>>>>>>>>>>>>>>>> function according to a Java/Scala/Python API, etc.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> I think we could still explore some long
>>>>>>>>>>>>>>>>>>>>>> term opportunities in this case. Consider you register a 
>>>>>>>>>>>>>>>>>>>>>> Spark temp view as
>>>>>>>>>>>>>>>>>>>>>> some sort of data frame read, then it could still be 
>>>>>>>>>>>>>>>>>>>>>> resolved to a Spark
>>>>>>>>>>>>>>>>>>>>>> plan that is representable by an intermediate 
>>>>>>>>>>>>>>>>>>>>>> representation. But I agree
>>>>>>>>>>>>>>>>>>>>>> this gets very complicated very soon, and just having 
>>>>>>>>>>>>>>>>>>>>>> the case (1) covered
>>>>>>>>>>>>>>>>>>>>>> would already be a huge step forward.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> -Jack
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM Benny Chow <
>>>>>>>>>>>>>>>>>>>>>> btc...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> It's interesting to note that a tabular
>>>>>>>>>>>>>>>>>>>>>> SQL UDF can be used to build a parameterized view.  So, 
>>>>>>>>>>>>>>>>>>>>>> there's definitely
>>>>>>>>>>>>>>>>>>>>>> a lot in common between UDFs and views.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa
>>>>>>>>>>>>>>>>>>>>>> Eldin Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> I think there is a disconnect about what
>>>>>>>>>>>>>>>>>>>>>> is perceived as a "UDF". There are 2 flavors:
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by the
>>>>>>>>>>>>>>>>>>>>>> user whose definition is a composition of other built-in 
>>>>>>>>>>>>>>>>>>>>>> functions/SQL
>>>>>>>>>>>>>>>>>>>>>> expressions.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (2) Custom code written in imperative
>>>>>>>>>>>>>>>>>>>>>> function according to a Java/Scala/Python API, etc.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's references
>>>>>>>>>>>>>>>>>>>>>> are pretty much from (1) and I think those have more 
>>>>>>>>>>>>>>>>>>>>>> analogy to views due
>>>>>>>>>>>>>>>>>>>>>> to their SQL nature. Agree (2) is not practical to 
>>>>>>>>>>>>>>>>>>>>>> maintain by Iceberg, but
>>>>>>>>>>>>>>>>>>>>>> I think Ajantha's use cases are around (1), and may be 
>>>>>>>>>>>>>>>>>>>>>> worth evaluating.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM Ajantha
>>>>>>>>>>>>>>>>>>>>>> Bhat <ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you post
>>>>>>>>>>>>>>>>>>>>>> the proposal, but I think this would be a very difficult 
>>>>>>>>>>>>>>>>>>>>>> area to tackle
>>>>>>>>>>>>>>>>>>>>>> across engines, languages, and memory models without 
>>>>>>>>>>>>>>>>>>>>>> having a huge
>>>>>>>>>>>>>>>>>>>>>> performance penalty.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially supports SQL
>>>>>>>>>>>>>>>>>>>>>> representations of UDFs (similar to views as shared by 
>>>>>>>>>>>>>>>>>>>>>> the reference links
>>>>>>>>>>>>>>>>>>>>>> above), the complexity involved will be similar to 
>>>>>>>>>>>>>>>>>>>>>> managing views.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack, for your
>>>>>>>>>>>>>>>>>>>>>> input.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> We will work on publishing the draft
>>>>>>>>>>>>>>>>>>>>>> spec (inspired by the view spec) this week to facilitate 
>>>>>>>>>>>>>>>>>>>>>> further
>>>>>>>>>>>>>>>>>>>>>> discussions.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM Jack Ye <
>>>>>>>>>>>>>>>>>>>>>> yezhao...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> > While it would be great to have a
>>>>>>>>>>>>>>>>>>>>>> common set of functions across engines, I don't see how 
>>>>>>>>>>>>>>>>>>>>>> that is practical
>>>>>>>>>>>>>>>>>>>>>> when those engines are implemented so differently. 
>>>>>>>>>>>>>>>>>>>>>> Plugging in code -- and
>>>>>>>>>>>>>>>>>>>>>> especially custom user-supplied code -- seems inherently 
>>>>>>>>>>>>>>>>>>>>>> specialized to me
>>>>>>>>>>>>>>>>>>>>>> and should be part of the engines' design.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> How is this different from the views? I
>>>>>>>>>>>>>>>>>>>>>> feel we can say exactly the same thing for Iceberg 
>>>>>>>>>>>>>>>>>>>>>> views, but yet we have
>>>>>>>>>>>>>>>>>>>>>> Iceberg multi-dialect views implemented. Maybe it sounds 
>>>>>>>>>>>>>>>>>>>>>> like we are trying
>>>>>>>>>>>>>>>>>>>>>> to draw a line between SQL vs other programming language 
>>>>>>>>>>>>>>>>>>>>>> as "code"? but I
>>>>>>>>>>>>>>>>>>>>>> think SQL is just another type of code, and we are 
>>>>>>>>>>>>>>>>>>>>>> already talking about
>>>>>>>>>>>>>>>>>>>>>> compiling all these different code dialects to an 
>>>>>>>>>>>>>>>>>>>>>> intermediate
>>>>>>>>>>>>>>>>>>>>>> representation (using projects like Coral, Substrait), 
>>>>>>>>>>>>>>>>>>>>>> which will be stored
>>>>>>>>>>>>>>>>>>>>>> as another type of representation of Iceberg view. I 
>>>>>>>>>>>>>>>>>>>>>> think the same
>>>>>>>>>>>>>>>>>>>>>> functionality can be used for UDFs if developed.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF support is a
>>>>>>>>>>>>>>>>>>>>>> good idea, even just a multi-dialect one like view, and 
>>>>>>>>>>>>>>>>>>>>>> that can allow
>>>>>>>>>>>>>>>>>>>>>> engines to for example parse a view SQL, and when a 
>>>>>>>>>>>>>>>>>>>>>> function referenced
>>>>>>>>>>>>>>>>>>>>>> cannot be resolved, try to seek for a multi-dialect UDF 
>>>>>>>>>>>>>>>>>>>>>> definition.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when we
>>>>>>>>>>>>>>>>>>>>>> have the actual proposal published.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jack Ye
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM Robert
>>>>>>>>>>>>>>>>>>>>>> Stupp <sn...@snazy.de> wrote:
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and
>>>>>>>>>>>>>>>>>>>>>> portable and "non-centralized" as views are. The same 
>>>>>>>>>>>>>>>>>>>>>> performance concerns
>>>>>>>>>>>>>>>>>>>>>> apply to views as well.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common base
>>>>>>>>>>>>>>>>>>>>>> upon which engines can build, so the argument that UDFs 
>>>>>>>>>>>>>>>>>>>>>> aren't practical,
>>>>>>>>>>>>>>>>>>>>>> because engines are different, is probably only a 
>>>>>>>>>>>>>>>>>>>>>> temporary concern.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg should also
>>>>>>>>>>>>>>>>>>>>>> try to tackle the idea to make views portable, which is 
>>>>>>>>>>>>>>>>>>>>>> conceptually not
>>>>>>>>>>>>>>>>>>>>>> that much different from portable UDFs.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a negative
>>>>>>>>>>>>>>>>>>>>>> touch to the idea of having UDFs in Iceberg, especially 
>>>>>>>>>>>>>>>>>>>>>> not in this early
>>>>>>>>>>>>>>>>>>>>>> stage.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue wrote:
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether it's a
>>>>>>>>>>>>>>>>>>>>>> good idea to add UDFs tracked by Iceberg catalogs. I 
>>>>>>>>>>>>>>>>>>>>>> think that Iceberg
>>>>>>>>>>>>>>>>>>>>>> primarily deals with things that are centralized, like 
>>>>>>>>>>>>>>>>>>>>>> tables of data.
>>>>>>>>>>>>>>>>>>>>>> While it would be great to have a common set of 
>>>>>>>>>>>>>>>>>>>>>> functions across engines, I
>>>>>>>>>>>>>>>>>>>>>> don't see how that is practical when those engines are 
>>>>>>>>>>>>>>>>>>>>>> implemented so
>>>>>>>>>>>>>>>>>>>>>> differently. Plugging in code -- and especially custom 
>>>>>>>>>>>>>>>>>>>>>> user-supplied code
>>>>>>>>>>>>>>>>>>>>>> -- seems inherently specialized to me and should be part 
>>>>>>>>>>>>>>>>>>>>>> of the engines'
>>>>>>>>>>>>>>>>>>>>>> design.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when you post
>>>>>>>>>>>>>>>>>>>>>> the proposal, but I think this would be a very difficult 
>>>>>>>>>>>>>>>>>>>>>> area to tackle
>>>>>>>>>>>>>>>>>>>>>> across engines, languages, and memory models without 
>>>>>>>>>>>>>>>>>>>>>> having a huge
>>>>>>>>>>>>>>>>>>>>>> performance penalty.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM
>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone,
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge the
>>>>>>>>>>>>>>>>>>>>>> community interest in storing the Versioned SQL UDFs in 
>>>>>>>>>>>>>>>>>>>>>> Iceberg.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec addition
>>>>>>>>>>>>>>>>>>>>>> for storing the versioned UDFs in Iceberg (inspired by 
>>>>>>>>>>>>>>>>>>>>>> view spec).
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate similarly to
>>>>>>>>>>>>>>>>>>>>>> views in that they are associated with tables, but they 
>>>>>>>>>>>>>>>>>>>>>> can accept
>>>>>>>>>>>>>>>>>>>>>> arguments and produce return values, or even function as 
>>>>>>>>>>>>>>>>>>>>>> inline expressions.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like Dremio,
>>>>>>>>>>>>>>>>>>>>>> Trino, Snowflake, Databricks Spark supports SQL UDFs at 
>>>>>>>>>>>>>>>>>>>>>> catalog level [1].
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg can enable
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the
>>>>>>>>>>>>>>>>>>>>>> engines. Potentially engines can understand the UDFs 
>>>>>>>>>>>>>>>>>>>>>> written by other
>>>>>>>>>>>>>>>>>>>>>> engines (with the translate layer).
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating this
>>>>>>>>>>>>>>>>>>>>>> feature into Iceberg would be a valuable addition, and 
>>>>>>>>>>>>>>>>>>>>>> we're eager to
>>>>>>>>>>>>>>>>>>>>>> collaborate with the community to develop a UDF 
>>>>>>>>>>>>>>>>>>>>>> specification.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun drafting a
>>>>>>>>>>>>>>>>>>>>>> specification to propose to the community.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on this.
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dremio -
>>>>>>>>>>>>>>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Trino -
>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/sql/create-function.html
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Snowflake -
>>>>>>>>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Databricks -
>>>>>>>>>>>>>>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Tabular
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp
>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> @snazy
>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>>>>>>> >> Ryan Blue
>>>>>>>>>>>>>>>>>>>>>> >> Databricks
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>>>>>>> Databricks
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>

Reply via email to