+1

On Thu, Apr 17, 2025 at 12:30 PM Ryan Blue <rdb...@gmail.com> wrote:

> Adding my own +1.
>
> On Thu, Apr 17, 2025 at 10:19 AM Daniel Weeks <dwe...@apache.org> wrote:
>
>> +1 (binding)
>>
>> I think this update really helps ensure row ids will be present and
>> reliable for upgraded tables.  Thanks Ryan!
>>
>> On Wed, Apr 16, 2025 at 4:09 PM Ryan Blue <rdb...@gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> I’d like to start a vote to incorporate the spec changes in PR 12781
>>> <https://github.com/apache/iceberg/pull/12781>.
>>>
>>> There are two main changes. First, the current language says that
>>> upgrading a table to v3 leaves all row IDs null and they are assigned when
>>> the rows are rewritten for the first time (either to move or modify the
>>> row). The problem with this is that row IDs are missing until the entire
>>> table is rewritten, which means that the feature is unreliable. Instead, I
>>> propose that row IDs are assigned in the first write after upgrading to v3.
>>>
>>> In addition to making row IDs more useful, the change to how we upgrade
>>> tables allows us to simplify the spec with statements like “any added or
>>> existing data file without first_row_id should be assigned one via
>>> inheritance” and “any manifest without a first_row_id must be assigned
>>> one when writing a manifest list”. I think this sets clearer expectations.
>>>
>>> Second, I found some issues with the strict way that first_row_id is
>>> inherited and assigned in the metadata tree. The current wording would
>>> prevent writers from assigning row IDs to existing data files because
>>> assignment was strict and only accounted for added files. Instead, I
>>> propose changing the wording to “must be greater than or equal to” so that
>>> there is some flexibility, and giving simple examples that are safe, like 
>>> first_row_id
>>> = last_assigned.first_row_id + last_assigned.added_rows_count +
>>> last_assigned.existing_rows_count.
>>>
>>> Please take a look at the PR and vote in the next 72 hours.
>>>
>>> [ ] +1 Add these changes to the spec for v3 row lineage
>>> [ ] +0
>>> [ ] -1 I have questions and/or concerns
>>>
>>> Thanks,
>>>
>>> Ryan
>>>
>>

Reply via email to