+1 (binding)  Seems cleaner to me.

Thanks
Szehon

On Thu, Apr 17, 2025 at 10:31 AM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> +1
>
> On Thu, Apr 17, 2025 at 12:30 PM Ryan Blue <rdb...@gmail.com> wrote:
>
>> Adding my own +1.
>>
>> On Thu, Apr 17, 2025 at 10:19 AM Daniel Weeks <dwe...@apache.org> wrote:
>>
>>> +1 (binding)
>>>
>>> I think this update really helps ensure row ids will be present and
>>> reliable for upgraded tables.  Thanks Ryan!
>>>
>>> On Wed, Apr 16, 2025 at 4:09 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I’d like to start a vote to incorporate the spec changes in PR 12781
>>>> <https://github.com/apache/iceberg/pull/12781>.
>>>>
>>>> There are two main changes. First, the current language says that
>>>> upgrading a table to v3 leaves all row IDs null and they are assigned when
>>>> the rows are rewritten for the first time (either to move or modify the
>>>> row). The problem with this is that row IDs are missing until the entire
>>>> table is rewritten, which means that the feature is unreliable. Instead, I
>>>> propose that row IDs are assigned in the first write after upgrading to v3.
>>>>
>>>> In addition to making row IDs more useful, the change to how we upgrade
>>>> tables allows us to simplify the spec with statements like “any added or
>>>> existing data file without first_row_id should be assigned one via
>>>> inheritance” and “any manifest without a first_row_id must be assigned
>>>> one when writing a manifest list”. I think this sets clearer expectations.
>>>>
>>>> Second, I found some issues with the strict way that first_row_id is
>>>> inherited and assigned in the metadata tree. The current wording would
>>>> prevent writers from assigning row IDs to existing data files because
>>>> assignment was strict and only accounted for added files. Instead, I
>>>> propose changing the wording to “must be greater than or equal to” so that
>>>> there is some flexibility, and giving simple examples that are safe, like 
>>>> first_row_id
>>>> = last_assigned.first_row_id + last_assigned.added_rows_count +
>>>> last_assigned.existing_rows_count.
>>>>
>>>> Please take a look at the PR and vote in the next 72 hours.
>>>>
>>>> [ ] +1 Add these changes to the spec for v3 row lineage
>>>> [ ] +0
>>>> [ ] -1 I have questions and/or concerns
>>>>
>>>> Thanks,
>>>>
>>>> Ryan
>>>>
>>>

Reply via email to