Re: [DISCUSS] Finalizing the v3 spec

2025-04-29 Thread Xuanwo
Hi Ryan. Thank for starting this. I share the same concern as Russell regarding the recent discussion about `metadata.json.gz`. I think it's a good time to clarify the behavior and perhaps allow for additional compression algorithms here. We can start a seperate discuss thread if needed. > A

Re: [DISCUSS] Finalizing the v3 spec

2025-04-29 Thread Jia Yu
Hi Szehon, Thanks for clarifying it. We’re currently addressing the handling of null/NaN values for X, Y, Z, and M coordinates in the Parquet format repository. We’ve already concluded that the spec of Parquet (same on the Iceberg side I believe) only needs additional clarification to guide expec

Re: [DISCUSS] Finalizing the v3 spec

2025-04-29 Thread Fokko Driesprong
Hey Ryan, Thanks for raising this, and I'm very excited to see V3 being finalized! The v3 spec for multi-arg transform only advises to use `source-ids` > instead of `source-id`. Although it is implicit and obvious that only > bucket transform can apply to multi-arg transform, it is still unclear

Re: [DISCUSS] Finalizing the v3 spec

2025-04-29 Thread Szehon Ho
Hi Jia I think its about the spec, and not the implementation (which is definitely good to reduce risk to need to change the spec). We actually wanted to get our Parquet reader/writer out for this effort, but as we see, it seems it depends on next Parquet-java release for the new Geo types on Par

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

2025-04-29 Thread Walaa Eldin Moustafa
Hi Rishabh, You're right that the proposal touches on two aspects, and resolution rules are one of them. The other aspect is the proposal's position that table identifiers should be stored in metadata exactly as they appear in the view text (e.g., even if they're two-part or partially qualified),

Re: [DISCUSS] Finalizing the v3 spec

2025-04-29 Thread Jia Yu
Hi folks, For Iceberg Geo, we are still waiting for the PR of geospatial bounds and geospatial predicate to be merged: https://github.com/apache/iceberg/pull/12667 Should a release with core updates include this PR? Thanks, Jia On Tue, Apr 29, 2025 at 10:21 PM Manu Zhang wrote: > Agree with R

Re: [DISCUSS] Finalizing the v3 spec

2025-04-29 Thread Manu Zhang
Agree with Russell and JB that we make a "RC" release for V3 spec to test implementations, compatibility, etc before finalizing it. Thanks, Manu On Wed, Apr 30, 2025 at 12:24 PM Jean-Baptiste Onofré wrote: > Hi Ryan > > It sounds good. > > About multi-args transforms, with the clarification we

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

2025-04-29 Thread Rishabh Bhatia
Hello Walaa, Thanks for starting this discussion. I think we should decouple at least the MV Spec from the proposal to change the current behavior of view resolution. We can continue having the discussion if the current view spec needs to be changed or not. Based on the decision at a later point

Re: [DISCUSS] Finalizing the v3 spec

2025-04-29 Thread Jean-Baptiste Onofré
Hi Ryan It sounds good. About multi-args transforms, with the clarification we did a couple of weeks ago, I think we are good. Maybe a release with the core updated before announcing spec v3 officially would be a good idea ? Regards JB Le mer. 30 avr. 2025 à 00:35, Ryan Blue a écrit : > Hi ev

Re: [DISCUSS] Finalizing the v3 spec

2025-04-29 Thread Jean-Baptiste Onofré
Hi Gang I’m working on the multi args transforms support: https://github.com/apache/iceberg/pull/12897 You can find details about impl in core. Regards JB Le mer. 30 avr. 2025 à 03:47, Gang Wu a écrit : > Please correct me if I'm wrong. > > The v3 spec for multi-arg transform only advises to

Re: [DISCUSS] Finalizing the v3 spec

2025-04-29 Thread Gang Wu
Please correct me if I'm wrong. The v3 spec for multi-arg transform only advises to use `source-ids` instead of `source-id`. Although it is implicit and obvious that only bucket transform can apply to multi-arg transform, it is still unclear the order of source columns and algorithm to use to calc

Re: [DISCUSS] Finalizing the v3 spec

2025-04-29 Thread Russell Spitzer
We should probably come to a resolution on the compressed metadata.json name as well, although that's mostly retroactive. V3 would be the place where we could officially change the naming convention. I'm also interested in getting a release with the full implementation of V3 as it currently stands

[DISCUSS] Finalizing the v3 spec

2025-04-29 Thread Ryan Blue
Hi everyone, I think we’ve reached the point where it’s time to finalize and adopt the changes for Iceberg v3. We’ve been working toward this for the last few months and have now implemented the v3 features in the Java library to reduce the risk of needing changes or hitting problems (row lineage

[VOTE] Add encryption keys to table metadata

2025-04-29 Thread Ryan Blue
Hi everyone, I’d like to propose merging PR 12162 into the table spec for v3. The changes are a minimal set of additions needed to support table encryption schemes, including the scheme that we’re working on for table encryption with client-mana

Re: [DISCUSS] Spec update to cover compressed JSON metadata files

2025-04-29 Thread Micah Kornfield
I wanted to clarify, as others have pointed out, that the PR documents existing functionality and making changes to it at this point risks breaking clients I think any changes to naming convention would have to be done as part of a new version of the spec (and file system based commits must be com

Re: Feathercast: 1.9.0 release

2025-04-29 Thread Rich Bowen
On 2025/04/29 06:32:45 Jean-Baptiste Onofré wrote: > Hi Rich > > That's a great idea. What about a couple of community members for the > feather cast ? > I'm happy to help if needed. Thanks to everyone who has responded. I don't have a preference who I speak with. Perhaps you can decide and let

Re: [VOTE] Release Apache PyIceberg 0.9.1rc1

2025-04-29 Thread Fokko Driesprong
+1 (binding) - Signature, checksum, licenses - Ran some tests against example notebooks . Kind regards, Fokko Op ma 28 apr 2025 om 17:59 schreef Kevin Liu : > +1 non-binding > > I verified > * signature, checksum, license > * Ran unit

Re: [Discuss] Streamlining Release Notes Preparation

2025-04-29 Thread Fokko Driesprong
One suggestion is to prune the list a bit manually. For example, back for the 1.6.0 release, I've sorted the list , which already makes is much easier to read since we do a pretty good job at prefixing the PRs (Spark, Spec, Core,