+ (non-binding).

On Thu, Apr 17, 2025 at 11:22 AM Steven Wu <stevenz...@gmail.com> wrote:

> +1 (binding)
>
> On Thu, Apr 17, 2025 at 11:09 AM Amogh Jahagirdar <2am...@gmail.com>
> wrote:
>
>> +1 (binding)
>>
>> On Thu, Apr 17, 2025 at 11:54 AM Szehon Ho <szehon.apa...@gmail.com>
>> wrote:
>>
>>> +1 (binding)  Seems cleaner to me.
>>>
>>> Thanks
>>> Szehon
>>>
>>> On Thu, Apr 17, 2025 at 10:31 AM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>>
>>>> +1
>>>>
>>>> On Thu, Apr 17, 2025 at 12:30 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>>
>>>>> Adding my own +1.
>>>>>
>>>>> On Thu, Apr 17, 2025 at 10:19 AM Daniel Weeks <dwe...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> +1 (binding)
>>>>>>
>>>>>> I think this update really helps ensure row ids will be present and
>>>>>> reliable for upgraded tables.  Thanks Ryan!
>>>>>>
>>>>>> On Wed, Apr 16, 2025 at 4:09 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> I’d like to start a vote to incorporate the spec changes in PR 12781
>>>>>>> <https://github.com/apache/iceberg/pull/12781>.
>>>>>>>
>>>>>>> There are two main changes. First, the current language says that
>>>>>>> upgrading a table to v3 leaves all row IDs null and they are assigned 
>>>>>>> when
>>>>>>> the rows are rewritten for the first time (either to move or modify the
>>>>>>> row). The problem with this is that row IDs are missing until the entire
>>>>>>> table is rewritten, which means that the feature is unreliable. 
>>>>>>> Instead, I
>>>>>>> propose that row IDs are assigned in the first write after upgrading to 
>>>>>>> v3.
>>>>>>>
>>>>>>> In addition to making row IDs more useful, the change to how we
>>>>>>> upgrade tables allows us to simplify the spec with statements like “any
>>>>>>> added or existing data file without first_row_id should be assigned
>>>>>>> one via inheritance” and “any manifest without a first_row_id must
>>>>>>> be assigned one when writing a manifest list”. I think this sets clearer
>>>>>>> expectations.
>>>>>>>
>>>>>>> Second, I found some issues with the strict way that first_row_id
>>>>>>> is inherited and assigned in the metadata tree. The current wording 
>>>>>>> would
>>>>>>> prevent writers from assigning row IDs to existing data files because
>>>>>>> assignment was strict and only accounted for added files. Instead, I
>>>>>>> propose changing the wording to “must be greater than or equal to” so 
>>>>>>> that
>>>>>>> there is some flexibility, and giving simple examples that are safe, 
>>>>>>> like first_row_id
>>>>>>> = last_assigned.first_row_id + last_assigned.added_rows_count +
>>>>>>> last_assigned.existing_rows_count.
>>>>>>>
>>>>>>> Please take a look at the PR and vote in the next 72 hours.
>>>>>>>
>>>>>>> [ ] +1 Add these changes to the spec for v3 row lineage
>>>>>>> [ ] +0
>>>>>>> [ ] -1 I have questions and/or concerns
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Ryan
>>>>>>>
>>>>>>

Reply via email to