Tried to put together: https://github.com/apache/iceberg/pull/9125 to cover this.
On Sun, Nov 5, 2023 at 10:41 AM Ryan Blue <b...@tabular.io> wrote: > Right now, both should be guaranteed. The library will only create a new > spec identifier if there isn't an existing spec that meets the second > equality definition. That's implemented here: > https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1537-L1549 > > The spec assumes that no one would create the same spec with different > IDs, but we should probably clarify that it isn't allowed. > > For how this applies to tracking delete files, it is the partition spec ID > that should be checked. > > On Fri, Nov 3, 2023 at 10:35 AM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> Hello Iceberg Dev, >> The Iceberg specification for matching delete files with data files >> during scan planning states : >> >> "The data file’s partition (both spec and partition values) is equal to >> the delete file’s partition" >> >> Equality of partition specs appears slightly ambiguous (apologies if I >> missed this). I can imagine two different definitions: >> 1. Equality of partition spec identifier. >> 2. Equality of each element in the partition spec tuple (which raises >> the further question of which elements of a partition transform are >> considered for equality). This is distinct from option one only when >> identical tuples are allowed in the partition spec map with two different >> identifiers. >> >> I would assume the definition should be based on the partition spec >> identifier but wanted to clarify. >> >> Thanks, >> Micah >> >> >> >> >> >> >> >> > > -- > Ryan Blue > Tabular >