Hey JB, Thanks for jumping in here.
My point in the PR <https://github.com/apache/iceberg/pull/12644> is that in the current version of the spec, we must write source-ids for ≥V3 tables, and write source-id for ≤V2—this requires carrying the format-version to the serializer. Instead, what I propose in the PR is to write source-id in the case of a single argument (compatible with all versions on read time), and write source-ids only when there are multiple arguments. This way, we don't need to know about the table version when serializing the partition-spec/sort-order. I've simplified the PR as suggested by Szehon to first leave the bucketing transform for now, which I think is a great idea. Kind regards, Fokko Op do 3 apr 2025 om 16:17 schreef Jean-Baptiste Onofré <j...@nanthrax.net>: > Hi Fokko > > Sorry for the late reply :) > > 1. It sounds good to me. > 2. I started to work on the core to use only source-ids. The Writer is > writing only source-ids, whereas the Reader detects if source-id > exists and use it (for backward compatibility). By using source-ids, > it's clearly simpler and consistent. > > Regards > JB > > On Tue, Mar 25, 2025 at 8:03 PM Fokko Driesprong <fo...@apache.org> wrote: > > > > Hi everyone, > > > > I wanted to get your attention to some small changes to the multi-arg > transforms that I've bumped into while working on the V3 spec for PyIceberg. > > > > Up for debate. The spec does not point out an actual implementation of > transforms that accept multiple arguments. From the existing transforms, > the only contender is the bucket transform. Should we include this in the > V3 spec? It will only allow you to prune metadata if you do an equality > expression on all the fields that are part of the transform. > > Along the way, we've removed something that we did not intend. First we > allowed to write source-id and source-ids based on the number of arguments. > This has been changed to only allow source-ids for V3 in a PR that > introduces backward compatibility. I think this makes the JSON > parsers/producers more complex than needed (specifically PyIceberg). Also, > in Java, we would need to plumb down the table version to the > PartitionSpecParser.java. I think it would be great to simplify this. > > > > Please let me know what you think so we can tie up the loose ends for V3. > > > > Kind regards, > > Fokko > > > > > > >