Thanks JB and Fokko! I agree that we are good with multi-arg transform for v3.
Best, Gang On Wed, Apr 30, 2025 at 2:12 PM Xuanwo <xua...@apache.org> wrote: > Hi Ryan. > > Thank for starting this. > > I share the same concern as Russell regarding the recent discussion about > `metadata.json.gz`. I think it's a good time to clarify the behavior and > perhaps allow for additional compression algorithms here. We can start a > seperate discuss thread if needed. > > > At the PyIceberg side, we're also working to catch up on the V3 > capabilities <https://github.com/apache/iceberg-python/issues/1818>. > Having a Java release that exposes these capabilities helps, so we can do > round-trip validation. > > Agreed. We can begin work on the iceberg-rust side after the Java release. > > On Wed, Apr 30, 2025, at 13:47, Fokko Driesprong wrote: > > Hey Ryan, > > Thanks for raising this, and I'm very excited to see V3 being finalized! > > The v3 spec for multi-arg transform only advises to use `source-ids` > instead of `source-id`. Although it is implicit and obvious that only > bucket transform can apply to multi-arg transform, it is still unclear the > order of source columns and algorithm to use to calculate the bucket value. > > > V3 now uses source IDs when there are multiple arguments and source IDs > when there is just one. PR can be found here > <https://github.com/apache/iceberg/pull/12644>. This makes the > serialization deterministic without knowing the format-version, simplifying > the readers/writers. After some discussion on the PR, we've decided to > leave out the multi-arg bucket transform so the V3 spec can be finalized. > So V3 only contains the scaffolding for multi-arg transforms. > > For Iceberg Geo, we are still waiting for the PR of geospatial bounds and > geospatial predicate to be merged: > https://github.com/apache/iceberg/pull/12667 > > > I think it is a good idea to distinguish between the spec and the actual > code. If we all feel comfortable with the spec, I think we could finalize > it. Being comfortable also means that we know that we have a working > implementation, but I don't think we have to wrap up all the loose ends > before voting on the spec. > > At the PyIceberg side, we're also working to catch up on the V3 > capabilities <https://github.com/apache/iceberg-python/issues/1818>. > Having a Java release that exposes these capabilities helps, so we can do > round-trip validation. > > Kind regards, > Fokko > > > Op wo 30 apr 2025 om 07:26 schreef Jia Yu <ji...@apache.org>: > > Hi folks, > > For Iceberg Geo, we are still waiting for the PR of geospatial bounds and > geospatial predicate to be merged: > https://github.com/apache/iceberg/pull/12667 > > Should a release with core updates include this PR? > > Thanks, > Jia > > On Tue, Apr 29, 2025 at 10:21 PM Manu Zhang <owenzhang1...@gmail.com> > wrote: > > Agree with Russell and JB that we make a "RC" release for V3 spec to test > implementations, compatibility, etc before finalizing it. > > Thanks, > Manu > > On Wed, Apr 30, 2025 at 12:24 PM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > Hi Ryan > > It sounds good. > > About multi-args transforms, with the clarification we did a couple of > weeks ago, I think we are good. > Maybe a release with the core updated before announcing spec v3 officially > would be a good idea ? > > Regards > JB > > Le mer. 30 avr. 2025 à 00:35, Ryan Blue <rdb...@gmail.com> a écrit : > > Hi everyone, > > I think we’ve reached the point where it’s time to finalize and adopt the > changes for Iceberg v3. We’ve been working toward this for the last few > months and have now implemented the v3 features in the Java library to > reduce the risk of needing changes or hitting problems (row lineage support > in Spark 3.5 just went in!). We’ve also incorporated some clarifications > and minor changes back into the spec from what we’ve learned. > > At this point, I’m confident that the spec is reasonable and correct. > Thank you to everyone working on these reference implementations! > > The next step is to discuss any outstanding items or concerns about moving > forward, and then to have a vote thread to adopt the spec. I’ll start off > with a couple of items: > > One potential concern is that the upstream Variant spec hasn’t yet been > finalized by the Parquet community, but we’ve built a full, independent > implementation in Iceberg to validate the spec. I think the Parquet > community is primarily waiting on getting the PRs in to have a Java > reference implementation, so the risk of changes to the Variant spec is > small. > > There’s also an on-going vote to add encryption keys in support of full > table encryption that I think we want to get in. > > Any other items we may want to clear up? > > Ryan > > > Xuanwo > > https://xuanwo.io/ > >