+1 (non-binding)

On Fri, Apr 18, 2025 at 7:38 AM huaxin gao <huaxin.ga...@gmail.com> wrote:

> +1 (non-binding)
>
> On Thu, Apr 17, 2025 at 4:22 PM Denny Lee <denny.g....@gmail.com> wrote:
>
>> +1 (non-binding)
>>
>> On Thu, Apr 17, 2025 at 5:14 PM Aihua Xu <aihu...@gmail.com> wrote:
>>
>>> + (non-binding).
>>>
>>> On Thu, Apr 17, 2025 at 11:22 AM Steven Wu <stevenz...@gmail.com> wrote:
>>>
>>>> +1 (binding)
>>>>
>>>> On Thu, Apr 17, 2025 at 11:09 AM Amogh Jahagirdar <2am...@gmail.com>
>>>> wrote:
>>>>
>>>>> +1 (binding)
>>>>>
>>>>> On Thu, Apr 17, 2025 at 11:54 AM Szehon Ho <szehon.apa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> +1 (binding)  Seems cleaner to me.
>>>>>>
>>>>>> Thanks
>>>>>> Szehon
>>>>>>
>>>>>> On Thu, Apr 17, 2025 at 10:31 AM Russell Spitzer <
>>>>>> russell.spit...@gmail.com> wrote:
>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> On Thu, Apr 17, 2025 at 12:30 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Adding my own +1.
>>>>>>>>
>>>>>>>> On Thu, Apr 17, 2025 at 10:19 AM Daniel Weeks <dwe...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1 (binding)
>>>>>>>>>
>>>>>>>>> I think this update really helps ensure row ids will be present
>>>>>>>>> and reliable for upgraded tables.  Thanks Ryan!
>>>>>>>>>
>>>>>>>>> On Wed, Apr 16, 2025 at 4:09 PM Ryan Blue <rdb...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi everyone,
>>>>>>>>>>
>>>>>>>>>> I’d like to start a vote to incorporate the spec changes in PR
>>>>>>>>>> 12781 <https://github.com/apache/iceberg/pull/12781>.
>>>>>>>>>>
>>>>>>>>>> There are two main changes. First, the current language says that
>>>>>>>>>> upgrading a table to v3 leaves all row IDs null and they are 
>>>>>>>>>> assigned when
>>>>>>>>>> the rows are rewritten for the first time (either to move or modify 
>>>>>>>>>> the
>>>>>>>>>> row). The problem with this is that row IDs are missing until the 
>>>>>>>>>> entire
>>>>>>>>>> table is rewritten, which means that the feature is unreliable. 
>>>>>>>>>> Instead, I
>>>>>>>>>> propose that row IDs are assigned in the first write after upgrading 
>>>>>>>>>> to v3.
>>>>>>>>>>
>>>>>>>>>> In addition to making row IDs more useful, the change to how we
>>>>>>>>>> upgrade tables allows us to simplify the spec with statements like 
>>>>>>>>>> “any
>>>>>>>>>> added or existing data file without first_row_id should be
>>>>>>>>>> assigned one via inheritance” and “any manifest without a
>>>>>>>>>> first_row_id must be assigned one when writing a manifest list”.
>>>>>>>>>> I think this sets clearer expectations.
>>>>>>>>>>
>>>>>>>>>> Second, I found some issues with the strict way that first_row_id
>>>>>>>>>> is inherited and assigned in the metadata tree. The current wording 
>>>>>>>>>> would
>>>>>>>>>> prevent writers from assigning row IDs to existing data files because
>>>>>>>>>> assignment was strict and only accounted for added files. Instead, I
>>>>>>>>>> propose changing the wording to “must be greater than or equal to” 
>>>>>>>>>> so that
>>>>>>>>>> there is some flexibility, and giving simple examples that are safe, 
>>>>>>>>>> like first_row_id
>>>>>>>>>> = last_assigned.first_row_id + last_assigned.added_rows_count +
>>>>>>>>>> last_assigned.existing_rows_count.
>>>>>>>>>>
>>>>>>>>>> Please take a look at the PR and vote in the next 72 hours.
>>>>>>>>>>
>>>>>>>>>> [ ] +1 Add these changes to the spec for v3 row lineage
>>>>>>>>>> [ ] +0
>>>>>>>>>> [ ] -1 I have questions and/or concerns
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Ryan
>>>>>>>>>>
>>>>>>>>>

Reply via email to