+1 (non-binding) On Fri, Apr 18, 2025 at 7:38 AM huaxin gao <huaxin.ga...@gmail.com> wrote:
> +1 (non-binding) > > On Thu, Apr 17, 2025 at 4:22 PM Denny Lee <denny.g....@gmail.com> wrote: > >> +1 (non-binding) >> >> On Thu, Apr 17, 2025 at 5:14 PM Aihua Xu <aihu...@gmail.com> wrote: >> >>> + (non-binding). >>> >>> On Thu, Apr 17, 2025 at 11:22 AM Steven Wu <stevenz...@gmail.com> wrote: >>> >>>> +1 (binding) >>>> >>>> On Thu, Apr 17, 2025 at 11:09 AM Amogh Jahagirdar <2am...@gmail.com> >>>> wrote: >>>> >>>>> +1 (binding) >>>>> >>>>> On Thu, Apr 17, 2025 at 11:54 AM Szehon Ho <szehon.apa...@gmail.com> >>>>> wrote: >>>>> >>>>>> +1 (binding) Seems cleaner to me. >>>>>> >>>>>> Thanks >>>>>> Szehon >>>>>> >>>>>> On Thu, Apr 17, 2025 at 10:31 AM Russell Spitzer < >>>>>> russell.spit...@gmail.com> wrote: >>>>>> >>>>>>> +1 >>>>>>> >>>>>>> On Thu, Apr 17, 2025 at 12:30 PM Ryan Blue <rdb...@gmail.com> wrote: >>>>>>> >>>>>>>> Adding my own +1. >>>>>>>> >>>>>>>> On Thu, Apr 17, 2025 at 10:19 AM Daniel Weeks <dwe...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> +1 (binding) >>>>>>>>> >>>>>>>>> I think this update really helps ensure row ids will be present >>>>>>>>> and reliable for upgraded tables. Thanks Ryan! >>>>>>>>> >>>>>>>>> On Wed, Apr 16, 2025 at 4:09 PM Ryan Blue <rdb...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi everyone, >>>>>>>>>> >>>>>>>>>> I’d like to start a vote to incorporate the spec changes in PR >>>>>>>>>> 12781 <https://github.com/apache/iceberg/pull/12781>. >>>>>>>>>> >>>>>>>>>> There are two main changes. First, the current language says that >>>>>>>>>> upgrading a table to v3 leaves all row IDs null and they are >>>>>>>>>> assigned when >>>>>>>>>> the rows are rewritten for the first time (either to move or modify >>>>>>>>>> the >>>>>>>>>> row). The problem with this is that row IDs are missing until the >>>>>>>>>> entire >>>>>>>>>> table is rewritten, which means that the feature is unreliable. >>>>>>>>>> Instead, I >>>>>>>>>> propose that row IDs are assigned in the first write after upgrading >>>>>>>>>> to v3. >>>>>>>>>> >>>>>>>>>> In addition to making row IDs more useful, the change to how we >>>>>>>>>> upgrade tables allows us to simplify the spec with statements like >>>>>>>>>> “any >>>>>>>>>> added or existing data file without first_row_id should be >>>>>>>>>> assigned one via inheritance” and “any manifest without a >>>>>>>>>> first_row_id must be assigned one when writing a manifest list”. >>>>>>>>>> I think this sets clearer expectations. >>>>>>>>>> >>>>>>>>>> Second, I found some issues with the strict way that first_row_id >>>>>>>>>> is inherited and assigned in the metadata tree. The current wording >>>>>>>>>> would >>>>>>>>>> prevent writers from assigning row IDs to existing data files because >>>>>>>>>> assignment was strict and only accounted for added files. Instead, I >>>>>>>>>> propose changing the wording to “must be greater than or equal to” >>>>>>>>>> so that >>>>>>>>>> there is some flexibility, and giving simple examples that are safe, >>>>>>>>>> like first_row_id >>>>>>>>>> = last_assigned.first_row_id + last_assigned.added_rows_count + >>>>>>>>>> last_assigned.existing_rows_count. >>>>>>>>>> >>>>>>>>>> Please take a look at the PR and vote in the next 72 hours. >>>>>>>>>> >>>>>>>>>> [ ] +1 Add these changes to the spec for v3 row lineage >>>>>>>>>> [ ] +0 >>>>>>>>>> [ ] -1 I have questions and/or concerns >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Ryan >>>>>>>>>> >>>>>>>>>