Re: [DISCUSS] Finalizing the v3 spec

Jean-Baptiste Onofré Tue, 29 Apr 2025 21:20:02 -0700

Hi Gang

I’m working on the multi args transforms support:
https://github.com/apache/iceberg/pull/12897


You can find details about impl in core.

Regards
JB

Le mer. 30 avr. 2025 à 03:47, Gang Wu <ust...@gmail.com> a écrit :

> Please correct me if I'm wrong.
>
> The v3 spec for multi-arg transform only advises to use `source-ids`
> instead of `source-id`. Although it is implicit and obvious that only
> bucket transform can apply to multi-arg transform, it is still unclear the
> order of source columns and algorithm to use to calculate the bucket value.
>
> Is this something we need to clarify? A relevant question is whether to
> clarify that duplicate values in the `source-ids` are disallowed.
>
> Best,
> Gang
>
> On Wed, Apr 30, 2025 at 7:07 AM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> We should probably come to a resolution on the compressed metadata.json
>> name as well,
>> although that's mostly retroactive. V3 would be the place where we could
>> officially change the naming convention.
>>
>> I'm also interested in getting a release with the full implementation of
>> V3
>> as it currently stands before we vote for the spec to be closed so folks
>> can
>> really kick the tires a bit before we really close things down.
>>
>> I don't think I have any other Spec items left
>>
>> On Tue, Apr 29, 2025 at 5:35 PM Ryan Blue <rdb...@gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> I think we’ve reached the point where it’s time to finalize and adopt
>>> the changes for Iceberg v3. We’ve been working toward this for the last few
>>> months and have now implemented the v3 features in the Java library to
>>> reduce the risk of needing changes or hitting problems (row lineage support
>>> in Spark 3.5 just went in!). We’ve also incorporated some clarifications
>>> and minor changes back into the spec from what we’ve learned.
>>>
>>> At this point, I’m confident that the spec is reasonable and correct.
>>> Thank you to everyone working on these reference implementations!
>>>
>>> The next step is to discuss any outstanding items or concerns about
>>> moving forward, and then to have a vote thread to adopt the spec. I’ll
>>> start off with a couple of items:
>>>
>>> One potential concern is that the upstream Variant spec hasn’t yet been
>>> finalized by the Parquet community, but we’ve built a full, independent
>>> implementation in Iceberg to validate the spec. I think the Parquet
>>> community is primarily waiting on getting the PRs in to have a Java
>>> reference implementation, so the risk of changes to the Variant spec is
>>> small.
>>>
>>> There’s also an on-going vote to add encryption keys in support of full
>>> table encryption that I think we want to get in.
>>>
>>> Any other items we may want to clear up?
>>>
>>> Ryan
>>>
>>

Re: [DISCUSS] Finalizing the v3 spec

Reply via email to