+1

Op vr 18 apr 2025 om 08:09 schreef Jean-Baptiste Onofré <j...@nanthrax.net>:

> +1 (non binding)
>
> Regards
> JB
>
> Le jeu. 17 avr. 2025 à 01:08, Ryan Blue <rdb...@gmail.com> a écrit :
>
>> Hi everyone,
>>
>> I’d like to start a vote to incorporate the spec changes in PR 12781
>> <https://github.com/apache/iceberg/pull/12781>.
>>
>> There are two main changes. First, the current language says that
>> upgrading a table to v3 leaves all row IDs null and they are assigned when
>> the rows are rewritten for the first time (either to move or modify the
>> row). The problem with this is that row IDs are missing until the entire
>> table is rewritten, which means that the feature is unreliable. Instead, I
>> propose that row IDs are assigned in the first write after upgrading to v3.
>>
>> In addition to making row IDs more useful, the change to how we upgrade
>> tables allows us to simplify the spec with statements like “any added or
>> existing data file without first_row_id should be assigned one via
>> inheritance” and “any manifest without a first_row_id must be assigned
>> one when writing a manifest list”. I think this sets clearer expectations.
>>
>> Second, I found some issues with the strict way that first_row_id is
>> inherited and assigned in the metadata tree. The current wording would
>> prevent writers from assigning row IDs to existing data files because
>> assignment was strict and only accounted for added files. Instead, I
>> propose changing the wording to “must be greater than or equal to” so that
>> there is some flexibility, and giving simple examples that are safe, like 
>> first_row_id
>> = last_assigned.first_row_id + last_assigned.added_rows_count +
>> last_assigned.existing_rows_count.
>>
>> Please take a look at the PR and vote in the next 72 hours.
>>
>> [ ] +1 Add these changes to the spec for v3 row lineage
>> [ ] +0
>> [ ] -1 I have questions and/or concerns
>>
>> Thanks,
>>
>> Ryan
>>
>

Reply via email to