+1 (binding) Seems cleaner to me. Thanks Szehon
On Thu, Apr 17, 2025 at 10:31 AM Russell Spitzer <russell.spit...@gmail.com> wrote: > +1 > > On Thu, Apr 17, 2025 at 12:30 PM Ryan Blue <rdb...@gmail.com> wrote: > >> Adding my own +1. >> >> On Thu, Apr 17, 2025 at 10:19 AM Daniel Weeks <dwe...@apache.org> wrote: >> >>> +1 (binding) >>> >>> I think this update really helps ensure row ids will be present and >>> reliable for upgraded tables. Thanks Ryan! >>> >>> On Wed, Apr 16, 2025 at 4:09 PM Ryan Blue <rdb...@gmail.com> wrote: >>> >>>> Hi everyone, >>>> >>>> I’d like to start a vote to incorporate the spec changes in PR 12781 >>>> <https://github.com/apache/iceberg/pull/12781>. >>>> >>>> There are two main changes. First, the current language says that >>>> upgrading a table to v3 leaves all row IDs null and they are assigned when >>>> the rows are rewritten for the first time (either to move or modify the >>>> row). The problem with this is that row IDs are missing until the entire >>>> table is rewritten, which means that the feature is unreliable. Instead, I >>>> propose that row IDs are assigned in the first write after upgrading to v3. >>>> >>>> In addition to making row IDs more useful, the change to how we upgrade >>>> tables allows us to simplify the spec with statements like “any added or >>>> existing data file without first_row_id should be assigned one via >>>> inheritance” and “any manifest without a first_row_id must be assigned >>>> one when writing a manifest list”. I think this sets clearer expectations. >>>> >>>> Second, I found some issues with the strict way that first_row_id is >>>> inherited and assigned in the metadata tree. The current wording would >>>> prevent writers from assigning row IDs to existing data files because >>>> assignment was strict and only accounted for added files. Instead, I >>>> propose changing the wording to “must be greater than or equal to” so that >>>> there is some flexibility, and giving simple examples that are safe, like >>>> first_row_id >>>> = last_assigned.first_row_id + last_assigned.added_rows_count + >>>> last_assigned.existing_rows_count. >>>> >>>> Please take a look at the PR and vote in the next 72 hours. >>>> >>>> [ ] +1 Add these changes to the spec for v3 row lineage >>>> [ ] +0 >>>> [ ] -1 I have questions and/or concerns >>>> >>>> Thanks, >>>> >>>> Ryan >>>> >>>