Re: [DISCUSS] Iceberg Rust Sync Meeting

2024-10-11 Thread xxchan
Thank Xuanwo for raising this. I'm also interested and excited to catch up
and better participate in the community.

As for the time, I think we don't need to use the same time as Iceberg Sync
Meeting, and can choose a better time according to the Iceberg Rust
developers? (Perhaps we can have a poll)

On Fri, Oct 11, 2024 at 1:41 PM Christian Thiel
 wrote:

> +1 for rust sync. Thanks for the proposal Xuanwo. There are many open
> topics and alignment in the sync can help to clarify scopes and
> dependencies to move forward with iceberg-rust even faster.
> Time is good for me.
> --
> *Von:* Kevin Liu 
> *Gesendet:* Wednesday, October 9, 2024 4:47:57 PM
> *An:* dev@iceberg.apache.org 
> *Betreff:* Re: [DISCUSS] Iceberg Rust Sync Meeting
>
> +1 on sync meeting for iceberg rust. I want to get involved and catch up
> on the recent developments. For reference, here's the doc we've been using
> for the pyiceberg sync
> https://docs.google.com/document/d/1oMKodaZJrOJjPfc8PDVAoTdl02eGQKHlhwuggiw7s9U
>
> Best,
> Kevin
>
> On Wed, Oct 9, 2024 at 5:30 AM Xuanwo  wrote:
>
> Hi,
>
> I'm starting this thread to explore the idea of hosting an Iceberg Rust
> Sync Meeting. In this meeting, we will discuss recent major changes,
> pending PR reviews, and features in development. It will offer a space for
> Iceberg Rust contributors to connect and become familiar with each other,
> helping us identify and remove contribution barriers to the best of our
> ability.
>
> Details about this meeeting:
>
> I suggest hosting our meeting at the same time of day, but one week
> earlier than the Iceberg Sync Meeting. For example, if the Iceberg Sync
> Meeting is scheduled for Thursday, October 24, 2024, from 00:00 to 01:00
> GMT+8, the Iceberg Rust Sync Meeting would take place one week before, on
> Thursday, October 17, 2024, from 00:00 to 01:00 GMT+8.
>
> I also suggest using the same Google Meet code (if possible) so we don't
> get confused.
>
> These meetings will not be recorded, but I will take notes in a Google
> Doc, similar to what we do in the Iceberg Sync Meeting.
>
> What are your thoughts? I'm open to other options as well.
>
> Xuanwo
>
> https://xuanwo.io/
>
>


Re: Iceberg View Spec Improvements

2024-10-11 Thread Walaa Eldin Moustafa
Hi Benny,

> we don't need to list out such restrictions because they really depend on
the setup

I do not think this is correct. The restrictions do not depend on the
setup. They rather dictate it. All restrictions discussed in this thread do
that one way or the other.

The single engine (Dremio) example does not apply to this discussion. The
spec is clear if a single engine is in use, but the spec is not limited for
single engine use cases.

> If Dremio intended for that view to be readable by Spark, it would have
to adhere to all those restrictions I listed before.

Sure, but those restrictions are only stated in the mailing list (in many
forms). We are discussing if we should add them to the spec (in one form).

Thanks
Walaa.


On Fri, Oct 11, 2024 at 4:56 PM Benny Chow  wrote:

> Hi Russell
>
> Yes, you listed out the requirements to make the two Spark engines case
> work.  Basically, it allows each engine to dynamically resolve the table
> identifiers under the correct catalog name.
>
> Hello Walla
>
> IMO, we don't need to list out such restrictions because they really
> depend on the setup.   Multiple Iceberg catalogs?  Multiple engines?
> Consistent catalog names?  Are views created with USE in context?  Today,
> in Dremio, we save tons of views to Nessie with fully qualified SQL
> identifiers to other sources such as mysql or snowflake.  Those views may
> or may not have default-catalog and default-namespaces set depending on the
> USE context.  If Dremio intended for that view to be readable by Spark, it
> would have to adhere to all those restrictions I listed before.
>
> Thanks
> Benny
>
> On Fri, Oct 11, 2024 at 10:00 AM Walaa Eldin Moustafa <
> wa.moust...@gmail.com> wrote:
>
>> Benny, "Iceberg View Spec Improvements" includes documenting what is
>> supported and what is not. You listed a few restrictions. Many of them are
>> not documented on the current spec. Documenting them is what this thread is
>> about. We are trying to reach a consensus on the necessary constraints (so
>> we are not over- or under-restricting).
>>
>> Russell, I think what you stated is a version of the restrictions. From
>> my point of view, the list of the necessary restrictions are:
>>
>> * Engines must share the same default catalog names, ensuring that
>> partially specified SQL identifiers with catalog omitted are resolved to
>> the same fully specified SQL identifier across all engines.
>> * Engines must share the same default namespaces, ensuring that SQL
>> identifiers without catalog and namespace are resolved to the same fully
>> specified SQL identifier across all engines.
>> * All engines must resolve a fully specified SQL identifier to the same
>> storage table in the same storage catalog.
>>
>> Please let me know if this aligns with what you stated.
>>
>> Thanks,
>> Walaa.
>>
>>


Re: Iceberg View Spec Improvements

2024-10-11 Thread Benny Chow
Hi Russell

Yes, you listed out the requirements to make the two Spark engines case
work.  Basically, it allows each engine to dynamically resolve the table
identifiers under the correct catalog name.

Hello Walla

IMO, we don't need to list out such restrictions because they really depend
on the setup.   Multiple Iceberg catalogs?  Multiple engines?  Consistent
catalog names?  Are views created with USE in context?  Today, in Dremio,
we save tons of views to Nessie with fully qualified SQL identifiers to
other sources such as mysql or snowflake.  Those views may or may not have
default-catalog and default-namespaces set depending on the USE context.
If Dremio intended for that view to be readable by Spark, it would have to
adhere to all those restrictions I listed before.

Thanks
Benny

On Fri, Oct 11, 2024 at 10:00 AM Walaa Eldin Moustafa 
wrote:

> Benny, "Iceberg View Spec Improvements" includes documenting what is
> supported and what is not. You listed a few restrictions. Many of them are
> not documented on the current spec. Documenting them is what this thread is
> about. We are trying to reach a consensus on the necessary constraints (so
> we are not over- or under-restricting).
>
> Russell, I think what you stated is a version of the restrictions. From my
> point of view, the list of the necessary restrictions are:
>
> * Engines must share the same default catalog names, ensuring that
> partially specified SQL identifiers with catalog omitted are resolved to
> the same fully specified SQL identifier across all engines.
> * Engines must share the same default namespaces, ensuring that SQL
> identifiers without catalog and namespace are resolved to the same fully
> specified SQL identifier across all engines.
> * All engines must resolve a fully specified SQL identifier to the same
> storage table in the same storage catalog.
>
> Please let me know if this aligns with what you stated.
>
> Thanks,
> Walaa.
>
>


Re: Iceberg View Spec Improvements

2024-10-11 Thread Walaa Eldin Moustafa
Benny, "Iceberg View Spec Improvements" includes documenting what is
supported and what is not. You listed a few restrictions. Many of them are
not documented on the current spec. Documenting them is what this thread is
about. We are trying to reach a consensus on the necessary constraints (so
we are not over- or under-restricting).

Russell, I think what you stated is a version of the restrictions. From my
point of view, the list of the necessary restrictions are:

* Engines must share the same default catalog names, ensuring that
partially specified SQL identifiers with catalog omitted are resolved to
the same fully specified SQL identifier across all engines.
* Engines must share the same default namespaces, ensuring that SQL
identifiers without catalog and namespace are resolved to the same fully
specified SQL identifier across all engines.
* All engines must resolve a fully specified SQL identifier to the same
storage table in the same storage catalog.

Please let me know if this aligns with what you stated.

Thanks,
Walaa.


Re: [DISCUSS] REST: Standardize vended credentials in Spec

2024-10-11 Thread Dmitri Bourlatchkov
Hi Eduard,

The latest REST spec change PR LGTM overall.

I think it does make sense to avoid putting vendor-specific credential
properties into the REST spec itself. However, I still have a couple of
comments.

(reposting from GH) REST Servers have to provide concrete values for
vendor-specific credential properties. If there is no effort to standardize
that, I think the java client implementation (!) will become the de-facto
standard for these properties, which is sub-optimal, IMHO.

I'd propose to add a separate spec file, which will be distinct from the
REST catalog spec, and will serve exclusively to unify how clients and
servers interpret vendor-specific credentials for _known_ use cases. Its
purpose will not be to restrict what can be done, but to define a common
way of handling existing use cases.

WDYT?

Thanks,
Dmitri.

On Thu, Oct 10, 2024 at 5:38 AM Eduard Tudenhöfner 
wrote:

> Based on recent discussions the feedback was that we don't want to have
> anything storage-specific in the OpenAPI spec (other than documenting the
> different storage configurations, which is handled by #10576
> ).
> Therefore I've updated the PR and made it flexible enough so that we could
> still pass different credentials based on a given *prefix* but not need a
> new Spec change whenever new credentials are added/changed for a
> storage provider.
> That should hopefully work for everyone, so please take a look so that we
> can do a formal VOTE on these changes.
>
> It was also brought up that it would be helpful to see how this looks in
> the context of refreshing vended credentials. I'll share the proposal for
> refreshing vended credentials a bit later today.
>
> Thanks
> Eduard
>
> On Tue, Sep 24, 2024 at 6:34 PM Dmitri Bourlatchkov
>  wrote:
>
>> > wrt ISO 8601 timestamps: I'd like to keep things consistent with other
>> places in the spec, which are typically defined as millisecond values.
>>
>> Fair enough. Now that the spec states the reference point in time, using
>> millisecond offsets is fine.
>>
>> Cheers,
>> Dmitri.
>>
>> On Tue, Sep 24, 2024 at 10:41 AM Eduard Tudenhöfner <
>> etudenhoef...@apache.org> wrote:
>>
>>> Thanks Dmitri for reviewing the PR and the doc.
>>>
>>> wrt ISO 8601 timestamps: I'd like to keep things consistent with other
>>> places in the spec, which are typically defined as millisecond values.
>>>
>>> Thanks
>>> Eduard
>>>
>>> On Fri, Sep 20, 2024 at 4:46 PM Dmitri Bourlatchkov
>>>  wrote:
>>>
 Thanks for proposing this improvement, Eduard!

 Overall it seems pretty reasonable to me. I added a few comments in GH
 and in the doc.

 One higher level point I'd like to discuss is using ISO 8601 to format
 expiry timestamps (as opposed to numeric milliseconds values). This should
 hopefully make the config more human-readable without adding too much
 processing burden. I hope the standard is well supported by most language
 libraries now. It is certainly supported by java. WDYT?

 Thanks,
 Dmitri.

 On Fri, Sep 13, 2024 at 12:13 PM Eduard Tudenhöfner <
 etudenhoef...@apache.org> wrote:

> Hey everyone,
>
> I'd like to propose standardizing the vended credentials used in
> loadTable / loadView responses.
>
> I opened #8  to
> track the proposal in GH.
> Please find the proposal doc here
> 
>  (estimated
> read time: 5 minutes).
> The proposal requires a spec change, which can be seen in #10722
> .
>
> Thanks,
> Eduard
>



Re: Iceberg View Spec Improvements

2024-10-11 Thread Russell Spitzer
The two Spark engines case is the only case I'm stuck on. I'm not sure how
you can define a view that works regardless of configuration unless you
require that the catalog holding the view is the default catalog (which is
a config) and you also only produce catalog-less identifiers.

On Fri, Oct 11, 2024 at 10:08 AM Benny Chow  wrote:

> Having spent some time testing Nessie views with multiple engines
> (Dremio + Spark) using different catalog names and different namespaces, I
> tend to agree with Dan and Amogh that the current view spec is fine.
> Unlike tables, I think when it comes to views, engines have to "work
> together" if they expect to share the views.  Working together means:
>
> Providing multiple SQL representations
> Not using engine specific operators or UDFs
> Not using engine specific row column access policies
> Not using engine specific role based access control features such as view
> delegation (ex. query user vs view owner)
> Not using fully qualified SQL identifiers when engines don't standardize
> on catalog names
> Using standardized catalog names if cross catalog joins are needed in view
> SQLs
>
> Some of the above limitations also exist even for the same engine when you
> have for example two Spark clusters pointing to the same catalog and each
> cluster uses different catalog names.
>
> Best
> Benny
>
>
> On Thu, Oct 10, 2024 at 8:51 PM Amogh Jahagirdar <2am...@gmail.com> wrote:
>
>>  I took another pass over the view spec and I believe that
>> representations of identifiers and how resolution of references by engines
>> should be performed is clear. So from my perspective, at the moment we do
>> not need to change the view spec itself.
>>
>> I do acknowledge though that practically there can be scenarios where
>> catalog names are inconsistent across environments and this has led to
>> confusion when developing the MV spec (I'm remembering based on last week's
>> community sync). There are some recommendations so that implementations can
>> address these inconsistencies in this thread already, but I don't think
>> adding some more complexity to the view spec via some form of
>> normalizing/mapping identifiers is worth it for these cases. I think in its
>> current state it's a sufficient model for developing MVs, and shouldn't
>> block progression on that.
>>
>> I'm +1 on adding an "unsupported configurations" clarification though,
>> it's become clear to me that there's enough confusion around the
>> implications of the SQL identifiers in the spec that it's worth calling it
>> out.
>>
>> Thanks,
>>
>> Amogh Jahagirdar
>>
>> On Thu, Oct 10, 2024 at 5:08 PM Daniel Weeks  wrote:
>>
>>> Russell,
>>>
>>> I think there are a few existing ways to support that.  For example, if
>>> you exclude the default catalog and fully reference the table with
>>> .. most sql engines will interpret that correctly (for
>>> cross or known catalogs).  Also, if you omit the catalog and use a just
>>> ., it must use the catalog in which the view is defined (per the
>>> spec), which I think addresses your case.
>>>
>>> Server-side rewrite is possible, but I think we'd need to explore the
>>> specific cases, which we'll probably need to do as we consider secure views
>>> more closely.
>>>
>>> -Dan
>>>
>>> On Thu, Oct 10, 2024 at 3:59 PM Walaa Eldin Moustafa <
>>> wa.moust...@gmail.com> wrote:
>>>
 Hi Russel,

 Would this be a good candidate for a future version of the spec?

 Thanks,
 Walaa.


 On Thu, Oct 10, 2024 at 3:50 PM Russell Spitzer <
 russell.spit...@gmail.com> wrote:

> I still have an issue with representations not having explicit ways of
> incorporating the catalog name, I'm thinking about our potential future
> situation where we want to return a view for Fine Grained Access policies.
> In that case won't the Catalog need to craft a representation that matches
> the configuration of the engine? Doesn't this mean the client will have to
> tell the Catalog what its local name is?
>
> On Thu, Oct 10, 2024 at 5:34 PM Daniel Weeks 
> wrote:
>
>> Hey Walaa,
>>
>> I recognize the issue you're calling out but disagree there is an
>> implicit assumption in the spec.  The spec clearly says how identifiers
>> including catalogs and namespaces are represented/stored and how 
>> references
>> need to be resolved.  The idea that a catalog may not match is an
>> environmental/infrastructure/configuration issue related to where they 
>> are
>> being referenced from.
>>
>> If we think this is sufficiently confusing to people, I would be open
>> to discussing an "unsupported configurations" callout, but I don't think
>> this blocks work and am somewhat skeptical that it's necessary.
>>
>> -Dan
>>
>>
>>
>> On Thu, Oct 10, 2024 at 2:47 PM Walaa Eldin Moustafa <
>> wa.moust...@gmail.com> wrote:
>>
>>> Hi Dan,
>>>
>>> I think ther

Re: [DISCUSS] Iceberg Rust Sync Meeting

2024-10-11 Thread Sung Yun
Thank you for starting this thread Xuanwo, I'm +1 for a Iceberg Rust meeting.

Regarding the meeting time, I believe the Iceberg Catalog Community sync 
happens two consecutive weeks, at the same time as the Iceberg community sync, 
when there isn't the tri-weekly Iceberg meeting.

For example, if the Iceberg Sync Meeting is scheduled for Thursday, October 24, 
2024, from 00:00 to 01:00 GMT+8, there is a Catalog Community Sync Meeting 
scheduled for Thursday, October 17, 2024, from 00:00 to 01:00 GMT+8 and for 
Thursday, October 9, 2024, from 00:00 to 01:00 GMT+8.

Another option to consider to avoid conflict is to host the Iceberg Rust sync 
on a different day of the week. For example, PyIceberg sync meets on the last 
Tuesday of each month.

Regardless, I'm looking forward to attending the meeting and connecting with 
the community!

Sung

On 2024/10/11 13:31:59 Xuanwo wrote:
> > For reference, here's the doc we've been using for the pyiceberg sync 
> > https://docs.google.com/document/d/1oMKodaZJrOJjPfc8PDVAoTdl02eGQKHlhwuggiw7s9U
> 
> Thank you, Kevin. This document is excellent, and I'll use it as a template.
> 
> > I think we have meeting records for catalog meetings and community sync, so 
> > we should also record this?
> 
> For the record, I'm referring to the video recording, which I don't have a 
> setup for like the Iceberg Sync Meeting does. I plan to maintain a Google 
> document similar to the Iceberg Sync Meeting and Iceberg Python Sync Meetings.
> 
> > For time, I would suggest moving it one hour ahead, e.g. 23:00 to 00:00 
> > GTM+8, so that it's a little more friendly to people in asia?
> 
> This time looks good to me.
> 
> > As for the time, I think we don't need to use the same time as Iceberg Sync 
> > Meeting, and can choose a better time according to the Iceberg Rust 
> > developers? (Perhaps we can have a poll)
> 
> Hi, xxchan.
> 
> I propose a time close to the Iceberg Sync Meeting to ensure most community 
> members can join. I'm open to other options. Would you like to suggest one?
> 
> On Fri, Oct 11, 2024, at 15:56, xxchan wrote:
> > Thank Xuanwo for raising this. I'm also interested and excited to catch up 
> > and better participate in the community.
> > 
> > As for the time, I think we don't need to use the same time as Iceberg Sync 
> > Meeting, and can choose a better time according to the Iceberg Rust 
> > developers? (Perhaps we can have a poll)
> > 
> > On Fri, Oct 11, 2024 at 1:41 PM Christian Thiel 
> >  wrote:
> >> +1 for rust sync. Thanks for the proposal Xuanwo. There are many open 
> >> topics and alignment in the sync can help to clarify scopes and 
> >> dependencies to move forward with iceberg-rust even faster.
> >> Time is good for me.
> >> 
> >> 
> >> *Von:* Kevin Liu 
> >> *Gesendet:* Wednesday, October 9, 2024 4:47:57 PM
> >> *An:* dev@iceberg.apache.org 
> >> *Betreff:* Re: [DISCUSS] Iceberg Rust Sync Meeting
> >>  
> >> +1 on sync meeting for iceberg rust. I want to get involved and catch up 
> >> on the recent developments. For reference, here's the doc we've been using 
> >> for the pyiceberg sync 
> >> https://docs.google.com/document/d/1oMKodaZJrOJjPfc8PDVAoTdl02eGQKHlhwuggiw7s9U
> >> 
> >> Best,
> >> Kevin
> >> 
> >> On Wed, Oct 9, 2024 at 5:30 AM Xuanwo  wrote:
> >>> Hi,
> >>> 
> >>> I'm starting this thread to explore the idea of hosting an Iceberg Rust 
> >>> Sync Meeting. In this meeting, we will discuss recent major changes, 
> >>> pending PR reviews, and features in development. It will offer a space 
> >>> for Iceberg Rust contributors to connect and become familiar with each 
> >>> other, helping us identify and remove contribution barriers to the best 
> >>> of our ability.
> >>> 
> >>> Details about this meeeting:
> >>> 
> >>> I suggest hosting our meeting at the same time of day, but one week 
> >>> earlier than the Iceberg Sync Meeting. For example, if the Iceberg Sync 
> >>> Meeting is scheduled for Thursday, October 24, 2024, from 00:00 to 01:00 
> >>> GMT+8, the Iceberg Rust Sync Meeting would take place one week before, on 
> >>> Thursday, October 17, 2024, from 00:00 to 01:00 GMT+8.
> >>> 
> >>> I also suggest using the same Google Meet code (if possible) so we don't 
> >>> get confused.
> >>> 
> >>> These meetings will not be recorded, but I will take notes in a Google 
> >>> Doc, similar to what we do in the Iceberg Sync Meeting.
> >>> 
> >>> What are your thoughts? I'm open to other options as well.
> >>> 
> >>> Xuanwo
> >>> 
> >>> https://xuanwo.io/
> 
> Xuanwo
> 
> https://xuanwo.io/


Re: Spec changes for deletion vectors

2024-10-11 Thread Micah Kornfield
I think it might be worth mentioning the current proposal makes some,
mostly minor, design choices to try to be compatible with Delta Lake
deletion vectors.  I think there might be a general philosophical question
on what compromises the community is willing to make for compatibility
reasons.

On Thu, Oct 10, 2024 at 2:42 PM rdb...@gmail.com  wrote:

> Hi everyone,
>
> There seems to be broad agreement around Anton's proposal to use deletion
> vectors in Iceberg v3, so I've opened two PRs that update the spec with the
> proposed changes. The first, PR #11238
> , adds a new Puffin
> blob type, delete-vector-v1, that stores a delete vector. The second, PR
> #11240 , updates the
> Iceberg table spec.
>
> Please take a look and comment!
>
> Ryan
>


Re: Spec changes for deletion vectors

2024-10-11 Thread Manu Zhang
Hi Ryan,

Do you mean the doc Improve Position Deletes in V3

by
Anton? I don't recall Anton used the term "deletion vector" in his
proposal.

On Sat, Oct 12, 2024 at 12:30 AM Micah Kornfield 
wrote:

> I think it might be worth mentioning the current proposal makes some,
> mostly minor, design choices to try to be compatible with Delta Lake
> deletion vectors.  I think there might be a general philosophical question
> on what compromises the community is willing to make for compatibility
> reasons.
>
> On Thu, Oct 10, 2024 at 2:42 PM rdb...@gmail.com  wrote:
>
>> Hi everyone,
>>
>> There seems to be broad agreement around Anton's proposal to use deletion
>> vectors in Iceberg v3, so I've opened two PRs that update the spec with the
>> proposed changes. The first, PR #11238
>> , adds a new Puffin
>> blob type, delete-vector-v1, that stores a delete vector. The second, PR
>> #11240 , updates the
>> Iceberg table spec.
>>
>> Please take a look and comment!
>>
>> Ryan
>>
>


Re: Meeting Minutes 2024-10-02

2024-10-11 Thread Manu Zhang
Thanks Brian for sharing the notes. Some questions here.

Target release date set for October 31st, 2023
>
Should be 2024?😀

Proposal to create new Iceberg C++ library approved

In this thread[1], Xuanwo and Renjie mentioned iceberg-rust implementation
and c++ bindings. Do you have a strong opinion here?

[1] https://lists.apache.org/thread/z614q8x45475zpkc76xg96qz72cw23xk

On Thu, Oct 3, 2024 at 5:06 AM Brian Olsen  wrote:

> Hey Iceberg Nation,
>
> Here are the meeting minutes from the October 2nd meeting.
>
> Transcription/Recording
>
> https://youtu.be/4rQL8IMsajc
>
> ### 1.7 Release Planning
>
> 6:29 Target release date set for October 31st, 2023
> 7:20 Branch cut planned for mid-October
> 7:39 Some V3 spec features may be included (e.g. default values, type
> promotion)
> 9:02 Connect licensing PR expected to be included
> 15:47 Community members encouraged to review 1.7 milestone and add PRs
>
> ### C++ Puffin Reader/Writer
>
> 17:22 Proposal to create new Iceberg C++ library approved
> 16:49 Initially focused on Puffin implementation for Impala
> 17:22 Follows existing pattern of language-specific libraries
> 17:36 Community can propose additional functionality in the future
>
> ### Standardizing Credentials in REST API
>
> 22:00 Ongoing discussion about structure of credentials in API responses
> 23:00 Proposal to have well-defined credential structure for easier
> reasoning
> 24:27 Concerns raised about potential future changes to credential fields
> 24:41 Agreement to review refresh endpoint proposal before finalizing
> decision
>
> ### Materialized Views Specification
>
> 30:30 Challenges with catalog naming inconsistencies across query engines
> 34:49 Proposal to use UUIDs for table identification in metadata. How to
> map UUIDs back to catalog-specific identifiers. Consider SQL parsing as
> fallback solution to avoid immediate spec changes
>
> ### File
>
> 39:13 Use File IO or table operations mechanism to refresh expired
> credentials in File IO instances
> 40:00 Proposal for new File IO API to allow credential refreshing without
> rebuilding object
>
> 42:51 Planning underway for Iceberg Summit 2025
>


Re: [DISCUSS] Iceberg Rust Sync Meeting

2024-10-11 Thread Xuanwo
> For reference, here's the doc we've been using for the pyiceberg sync 
> https://docs.google.com/document/d/1oMKodaZJrOJjPfc8PDVAoTdl02eGQKHlhwuggiw7s9U

Thank you, Kevin. This document is excellent, and I'll use it as a template.

> I think we have meeting records for catalog meetings and community sync, so 
> we should also record this?

For the record, I'm referring to the video recording, which I don't have a 
setup for like the Iceberg Sync Meeting does. I plan to maintain a Google 
document similar to the Iceberg Sync Meeting and Iceberg Python Sync Meetings.

> For time, I would suggest moving it one hour ahead, e.g. 23:00 to 00:00 
> GTM+8, so that it's a little more friendly to people in asia?

This time looks good to me.

> As for the time, I think we don't need to use the same time as Iceberg Sync 
> Meeting, and can choose a better time according to the Iceberg Rust 
> developers? (Perhaps we can have a poll)

Hi, xxchan.

I propose a time close to the Iceberg Sync Meeting to ensure most community 
members can join. I'm open to other options. Would you like to suggest one?

On Fri, Oct 11, 2024, at 15:56, xxchan wrote:
> Thank Xuanwo for raising this. I'm also interested and excited to catch up 
> and better participate in the community.
> 
> As for the time, I think we don't need to use the same time as Iceberg Sync 
> Meeting, and can choose a better time according to the Iceberg Rust 
> developers? (Perhaps we can have a poll)
> 
> On Fri, Oct 11, 2024 at 1:41 PM Christian Thiel 
>  wrote:
>> +1 for rust sync. Thanks for the proposal Xuanwo. There are many open topics 
>> and alignment in the sync can help to clarify scopes and dependencies to 
>> move forward with iceberg-rust even faster.
>> Time is good for me.
>> 
>> 
>> *Von:* Kevin Liu 
>> *Gesendet:* Wednesday, October 9, 2024 4:47:57 PM
>> *An:* dev@iceberg.apache.org 
>> *Betreff:* Re: [DISCUSS] Iceberg Rust Sync Meeting
>>  
>> +1 on sync meeting for iceberg rust. I want to get involved and catch up on 
>> the recent developments. For reference, here's the doc we've been using for 
>> the pyiceberg sync 
>> https://docs.google.com/document/d/1oMKodaZJrOJjPfc8PDVAoTdl02eGQKHlhwuggiw7s9U
>> 
>> Best,
>> Kevin
>> 
>> On Wed, Oct 9, 2024 at 5:30 AM Xuanwo  wrote:
>>> Hi,
>>> 
>>> I'm starting this thread to explore the idea of hosting an Iceberg Rust 
>>> Sync Meeting. In this meeting, we will discuss recent major changes, 
>>> pending PR reviews, and features in development. It will offer a space for 
>>> Iceberg Rust contributors to connect and become familiar with each other, 
>>> helping us identify and remove contribution barriers to the best of our 
>>> ability.
>>> 
>>> Details about this meeeting:
>>> 
>>> I suggest hosting our meeting at the same time of day, but one week earlier 
>>> than the Iceberg Sync Meeting. For example, if the Iceberg Sync Meeting is 
>>> scheduled for Thursday, October 24, 2024, from 00:00 to 01:00 GMT+8, the 
>>> Iceberg Rust Sync Meeting would take place one week before, on Thursday, 
>>> October 17, 2024, from 00:00 to 01:00 GMT+8.
>>> 
>>> I also suggest using the same Google Meet code (if possible) so we don't 
>>> get confused.
>>> 
>>> These meetings will not be recorded, but I will take notes in a Google Doc, 
>>> similar to what we do in the Iceberg Sync Meeting.
>>> 
>>> What are your thoughts? I'm open to other options as well.
>>> 
>>> Xuanwo
>>> 
>>> https://xuanwo.io/

Xuanwo

https://xuanwo.io/

Re: Iceberg View Spec Improvements

2024-10-11 Thread Benny Chow
Having spent some time testing Nessie views with multiple engines (Dremio +
Spark) using different catalog names and different namespaces, I tend to
agree with Dan and Amogh that the current view spec is fine.  Unlike
tables, I think when it comes to views, engines have to "work together" if
they expect to share the views.  Working together means:

Providing multiple SQL representations
Not using engine specific operators or UDFs
Not using engine specific row column access policies
Not using engine specific role based access control features such as view
delegation (ex. query user vs view owner)
Not using fully qualified SQL identifiers when engines don't standardize on
catalog names
Using standardized catalog names if cross catalog joins are needed in view
SQLs

Some of the above limitations also exist even for the same engine when you
have for example two Spark clusters pointing to the same catalog and each
cluster uses different catalog names.

Best
Benny


On Thu, Oct 10, 2024 at 8:51 PM Amogh Jahagirdar <2am...@gmail.com> wrote:

>  I took another pass over the view spec and I believe that representations
> of identifiers and how resolution of references by engines should be
> performed is clear. So from my perspective, at the moment we do not need to
> change the view spec itself.
>
> I do acknowledge though that practically there can be scenarios where
> catalog names are inconsistent across environments and this has led to
> confusion when developing the MV spec (I'm remembering based on last week's
> community sync). There are some recommendations so that implementations can
> address these inconsistencies in this thread already, but I don't think
> adding some more complexity to the view spec via some form of
> normalizing/mapping identifiers is worth it for these cases. I think in its
> current state it's a sufficient model for developing MVs, and shouldn't
> block progression on that.
>
> I'm +1 on adding an "unsupported configurations" clarification though,
> it's become clear to me that there's enough confusion around the
> implications of the SQL identifiers in the spec that it's worth calling it
> out.
>
> Thanks,
>
> Amogh Jahagirdar
>
> On Thu, Oct 10, 2024 at 5:08 PM Daniel Weeks  wrote:
>
>> Russell,
>>
>> I think there are a few existing ways to support that.  For example, if
>> you exclude the default catalog and fully reference the table with
>> .. most sql engines will interpret that correctly (for
>> cross or known catalogs).  Also, if you omit the catalog and use a just
>> ., it must use the catalog in which the view is defined (per the
>> spec), which I think addresses your case.
>>
>> Server-side rewrite is possible, but I think we'd need to explore the
>> specific cases, which we'll probably need to do as we consider secure views
>> more closely.
>>
>> -Dan
>>
>> On Thu, Oct 10, 2024 at 3:59 PM Walaa Eldin Moustafa <
>> wa.moust...@gmail.com> wrote:
>>
>>> Hi Russel,
>>>
>>> Would this be a good candidate for a future version of the spec?
>>>
>>> Thanks,
>>> Walaa.
>>>
>>>
>>> On Thu, Oct 10, 2024 at 3:50 PM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>>
 I still have an issue with representations not having explicit ways of
 incorporating the catalog name, I'm thinking about our potential future
 situation where we want to return a view for Fine Grained Access policies.
 In that case won't the Catalog need to craft a representation that matches
 the configuration of the engine? Doesn't this mean the client will have to
 tell the Catalog what its local name is?

 On Thu, Oct 10, 2024 at 5:34 PM Daniel Weeks  wrote:

> Hey Walaa,
>
> I recognize the issue you're calling out but disagree there is an
> implicit assumption in the spec.  The spec clearly says how identifiers
> including catalogs and namespaces are represented/stored and how 
> references
> need to be resolved.  The idea that a catalog may not match is an
> environmental/infrastructure/configuration issue related to where they are
> being referenced from.
>
> If we think this is sufficiently confusing to people, I would be open
> to discussing an "unsupported configurations" callout, but I don't think
> this blocks work and am somewhat skeptical that it's necessary.
>
> -Dan
>
>
>
> On Thu, Oct 10, 2024 at 2:47 PM Walaa Eldin Moustafa <
> wa.moust...@gmail.com> wrote:
>
>> Hi Dan,
>>
>> I think there are a few questions that we should solve to decide the
>> path forward:
>>
>> ** Does the current spec contain implicit assumptions?*
>> I think the answer is yes. I think this is also what Ryan indicated
>> here [1].
>>
>> ** Do these implicit assumptions make it difficult to adopt the spec
>> or evolve it in the correct way?*
>> I think the answer is yes as well. MV design discussions became quite
>> complicated because most contributors