Re: [DISCUSS] Multi-arg transforms

Jean-Baptiste Onofré Sat, 05 Apr 2025 10:54:46 -0700

Hi Fokko

Sorry for the late reply :)


1. It sounds good to me.
2. I started to work on the core to use only source-ids. The Writer is
writing only source-ids, whereas the Reader detects if source-id
exists and use it (for backward compatibility). By using source-ids,
it's clearly simpler and consistent.

Regards
JB

On Tue, Mar 25, 2025 at 8:03 PM Fokko Driesprong <fo...@apache.org> wrote:
>
> Hi everyone,
>
> I wanted to get your attention to some small changes to the multi-arg 
> transforms that I've bumped into while working on the V3 spec for PyIceberg.
>
> Up for debate. The spec does not point out an actual implementation of 
> transforms that accept multiple arguments. From the existing transforms, the 
> only contender is the bucket transform. Should we include this in the V3 
> spec? It will only allow you to prune metadata if you do an equality 
> expression on all the fields that are part of the transform.
> Along the way, we've removed something that we did not intend. First we 
> allowed to write source-id and source-ids based on the number of arguments. 
> This has been changed to only allow source-ids for V3 in a PR that introduces 
> backward compatibility. I think this makes the JSON parsers/producers more 
> complex than needed (specifically PyIceberg). Also, in Java, we would need to 
> plumb down the table version to the PartitionSpecParser.java. I think it 
> would be great to simplify this.
>
> Please let me know what you think so we can tie up the loose ends for V3.
>
> Kind regards,
> Fokko
>
>
>

Re: [DISCUSS] Multi-arg transforms

Reply via email to