Re: Spec Clarification: Partition Spec equality

Micah Kornfield Tue, 21 Nov 2023 12:16:32 -0800

Tried to put together: https://github.com/apache/iceberg/pull/9125 to cover
this.


On Sun, Nov 5, 2023 at 10:41 AM Ryan Blue <b...@tabular.io> wrote:

> Right now, both should be guaranteed. The library will only create a new
> spec identifier if there isn't an existing spec that meets the second
> equality definition. That's implemented here:
> https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1537-L1549
>
> The spec assumes that no one would create the same spec with different
> IDs, but we should probably clarify that it isn't allowed.
>
> For how this applies to tracking delete files, it is the partition spec ID
> that should be checked.
>
> On Fri, Nov 3, 2023 at 10:35 AM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
>> Hello Iceberg Dev,
>> The Iceberg specification for matching delete files with data files
>> during scan planning states :
>>
>> "The data file’s partition (both spec and partition values) is equal to
>> the delete file’s partition"
>>
>> Equality of partition specs appears slightly ambiguous (apologies if I
>> missed this).  I can imagine two different definitions:
>> 1.  Equality of partition spec identifier.
>> 2.  Equality of each element in the partition spec tuple (which raises
>> the further question of which elements of a partition transform are
>> considered for equality). This is distinct from option one only when
>> identical tuples are allowed in the partition spec map with two different
>> identifiers.
>>
>> I would assume the definition should be based on the partition spec
>> identifier but wanted to clarify.
>>
>> Thanks,
>> Micah
>>
>>
>>
>>
>>
>>
>>
>>
>
> --
> Ryan Blue
> Tabular
>

Re: Spec Clarification: Partition Spec equality

Reply via email to