[DISCUSS] Multi-arg transforms

Fokko Driesprong Tue, 25 Mar 2025 12:05:32 -0700

Hi everyone,

I wanted to get your attention to some small changes
<https://github.com/apache/iceberg/pull/12644> to the multi-arg transforms
that I've bumped into while working on the V3 spec for PyIceberg.


   1. Up for debate. The spec does not point out an actual implementation
   of transforms that accept multiple arguments. From the existing transforms,
   the only contender is the bucket transform. Should we include this in the
   V3 spec? It will only allow you to prune metadata if you do an equality
   expression on all the fields that are part of the transform.
   2. Along the way, we've removed something that we did not intend. First
   we allowed to write source-id and source-ids based on the number of
   arguments. This has been changed to only allow source-ids for V3 in a PR
   that introduces backward compatibility. I think this makes the JSON
   parsers/producers more complex than needed (specifically PyIceberg). Also,
   in Java, we would need to plumb down the table version to the
   PartitionSpecParser.java. I think it would be great to simplify this.

Please let me know what you think so we can tie up the loose ends for V3.

Kind regards,
Fokko

[DISCUSS] Multi-arg transforms

Reply via email to