> the last-updated-seq-number should reflect the time of restoration. Like Peter said, this is the main point why row restoration shouldn't be allowed.
Incremental processing requires the last-updated-sequence-number to reflect the snapshot when a row is inserted or updated. If the restored row's last-updated-seq-number still inherits the old value when they were originally inserted, incremental processing can break. > Table RESTOREs are a commonly used feature I view table restore as a rollback operation in Iceberg. It can also bring back deleted rows. But users should expect broken row lineage when a table is reset to an old state. On Wed, Dec 3, 2025 at 2:22 AM Péter Váry <[email protected]> wrote: > > 1. When a row is restored, it is desirable that the row ID is restored > as well. So this sounds like a feature and not a bug to me. > > Is “restored” even a valid concept? Most systems only support adding and > deleting rows; restoration is typically treated as inserting a new row, > which should receive a new ID. > Even if we accept the notion of restoration, I agree with Steven’s point: > the last-updated-seq-number should reflect the time of restoration. Simply > removing the delete flag from the original row would be an invalid > operation. > > > 2. Table RESTOREs are a commonly used feature, and it would be > prohibitively expensive to rewrite data files to restore deleted rows > back into the table. > > That's an interesting point, but I think in this specific case, most users > would likely tolerate lineage corruption. For those who cannot, a full > table rewrite remains an option. > > Anoop Johnson <[email protected]> ezt írta (időpont: 2025. dec. 3., Sze, > 8:26): > >> I recommend not adding this restriction to the spec for two reasons. >> >> 1. When a row is restored, it is desirable that the row ID is restored as >> well. So this sounds like a feature and not a bug to me. >> 2. Table RESTOREs are a commonly used feature, and it would be >> prohibitively expensive to rewrite data files to restore deleted rows back >> into the table. >> >> Best, >> Anoop >> >> On Tue, Dec 2, 2025 at 4:56 PM Szehon Ho <[email protected]> wrote: >> >>> Szehon, I didn't quite understand this question. Can you elaborate a bit? >>> >>> >>> Yea I was wondering in the scenario you are discussing above, a new file: >>> >>>> >>>> - whose persisted row-id value is lower than the snapshot's >>>> first-row-id >>>> >>>> >>>> - whose last-updated-seq-number is not set and inherit from the >>>> snapshot sequence number >>>> >>>> I saw your interpretation though that it is not explicitly allowed. >>> >>> Overall, I was just trying to reason wondering whether its beneficial to >>> disallow a quick un-delete in the scenario that you describe, due to the >>> difficulty of implementing the row-lineage and other things, as the >>> scenario is not really a violation of the current row-lineage spec as >>> initially stated, but definitely troublesome. >>> >>> Thanks, >>> Szehon >>> >>> On Tue, Dec 2, 2025 at 2:52 PM Steven Wu <[email protected]> wrote: >>> >>>> Let's look at the following scenario >>>> >>>> * Snapshot 10 (first-row-id: 100) >>>> - A new data file was added and it contains row X. Row X inherits >>>> row-id as 105 and last-updated-sequence-number as 10 >>>> * Snapshot 11 (first-row-id: 200) >>>> - Row X was deleted via DV >>>> * Snapshot 12 (first-row-id: 300) >>>> - Row X was restored (added back) by rewriting DV and with the delete >>>> position unset. >>>> >>>> When querying the table after snapshot 12, the Row X would have the >>>> row-id as 105 and last-updated-sequence-number as 10 (just as the initial >>>> add at snapshot 10). The correct last-updated-sequence-number should be 12 >>>> and row-id should be >=300 for added/restored row X. >>>> >>>> Hence, we are proposing that it is invalid to restore a row by >>>> rewriting the DV or position delete file and unsetting the delete position. >>>> >>>> > But if a data file has all rows that have 'row-id' set and >>>> 'last_updated_sequence_number' unset, technically this can be a valid >>>> undelete, is it right? >>>> >>>> Szehon, I didn't quite understand this question. Can you elaborate a >>>> bit? >>>> >>>> >>>> >>>> >>>> On Tue, Dec 2, 2025 at 2:12 PM Szehon Ho <[email protected]> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> Sorry, I re-read the thread and Peter's question more closely, and >>>>> wanted to explore that we are not precluding something unnecessarily, and >>>>> if we can solve the code problem in other ways. >>>>> >>>>> The concern is that in the 'undeleted' row, the row_id and >>>>> last_updated_seq_number are wrong. >>>>> >>>>> - If 'row-id' is not set, it inherits a row-id that is changed, >>>>> which is wrong >>>>> - If 'last_updated_sequence_number' is set, then it is wrong >>>>> because it should refer to the snapshot that 'undeleted it'. >>>>> >>>>> Is that correct? >>>>> >>>>> But if a data file has all rows that have 'row-id' set and >>>>> 'last_updated_sequence_number' unset, technically this can be a valid >>>>> undelete, is it right? >>>>> >>>>> Thanks >>>>> Szehon >>>>> >>>>> On Mon, Dec 1, 2025 at 11:08 AM Steven Wu <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> > _row_id a unique long identifier for every row within the table. >>>>>> The value is assigned via inheritance when a row is first added to the >>>>>> table. >>>>>> >>>>>> Actually, current spec doesn't allow explicitly assigning row-id for >>>>>> new rows. >>>>>> >>>>>> So currently we don't need to worry about the question if it is >>>>>> allowed to have *new* rows with explicitly assigned row-id values >>>>>> lower than the snapshot's first-row-id. >>>>>> >>>>>> On Mon, Dec 1, 2025 at 9:50 AM Steven Wu <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Here is the spec PR to clarify undelete is not allowed. Will start a >>>>>>> vote thread for that. >>>>>>> https://github.com/apache/iceberg/pull/14731 >>>>>>> >>>>>>> Let me start a new discussion thread for the first-row-id and row-id >>>>>>> question for row lineage to get more attention and input. >>>>>>> >>>>>>> On Sat, Nov 22, 2025 at 7:02 AM Péter Váry < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Apologies if I was unclear. As Steven also mentioned, I wanted to >>>>>>>> confirm whether we agree on the clarification regarding the `row-id` >>>>>>>> and >>>>>>>> `first-row-id`. >>>>>>>> >>>>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2025. nov. >>>>>>>> 22., Szo, 15:28): >>>>>>>> >>>>>>>>> Just to clarify, I was asking a question. >>>>>>>>> >>>>>>>>> Is it valid to add a new data file with a row? >>>>>>>>> >>>>>>>>> - whose persisted row-id value is lower than the snapshot's >>>>>>>>> first-row-id >>>>>>>>> - whose last-updated-seq-number is not set and inherit from >>>>>>>>> the snapshot sequence number >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Steven >>>>>>>>> >>>>>>>>> On Fri, Nov 21, 2025 at 11:25 PM Péter Váry < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> +1 for this proposal >>>>>>>>>> >>>>>>>>>> Slightly related, but we can move this to a separate thread if it >>>>>>>>>> needs independent discussion: We should clarify the relationship >>>>>>>>>> between >>>>>>>>>> `row-id` and `first-row-id`. This has come up several times in our >>>>>>>>>> discussions about the equality delete removal proposal, where we >>>>>>>>>> considered >>>>>>>>>> generating `row-ids` manually instead of relying on the >>>>>>>>>> auto-assignment >>>>>>>>>> feature. >>>>>>>>>> >>>>>>>>>> As discussed with Steven: >>>>>>>>>> >>>>>>>>>>> It is valid to add a new data file with a row: >>>>>>>>>>> >>>>>>>>>>> - whose persisted row-id value is lower than the snapshot's >>>>>>>>>>> first-row-id >>>>>>>>>>> - whose last-updated-seq-number is not set and inherit from >>>>>>>>>>> the snapshot sequence number >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Prashant Singh <[email protected]> ezt írta (időpont: >>>>>>>>>> 2025. nov. 22., Szo, 5:29): >>>>>>>>>> >>>>>>>>>>> +1 for making it explicit that an *undelete *of a row can't be >>>>>>>>>>> done by unsetting the corresponding bit in DV >>>>>>>>>>> >>>>>>>>>>> *Rows should only be added via new data files*, sounds >>>>>>>>>>> reasonable to me ! >>>>>>>>>>> >>>>>>>>>>> apart from row-lineage it also complicates the operation type >>>>>>>>>>> inference like here [1] as we would now >>>>>>>>>>> inspect the contents of these DV to see if it's an insert ? >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> https://github.com/apache/iceberg/pull/14581#discussion_r2533057189 >>>>>>>>>>> >>>>>>>>>>> On Sat, Nov 22, 2025 at 4:48 AM Szehon Ho < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> It makes sense to me, it sounds like a minor clarification. >>>>>>>>>>>> For v2 position deletes, code like rewrite_position_deletes may >>>>>>>>>>>> have made >>>>>>>>>>>> some assumptions like this and would not work well if violated, >>>>>>>>>>>> maybe other >>>>>>>>>>>> code as well. >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> Szehon >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Nov 21, 2025 at 3:03 PM Steven Wu <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Similar weird behavior can also happen for V2 position delete >>>>>>>>>>>>> files with `undelete`. >>>>>>>>>>>>> >>>>>>>>>>>>> In V2, there could be multiple position delete files (say pd1, >>>>>>>>>>>>> pd2) associated with the same data file (say f1). Let's say pd1 >>>>>>>>>>>>> deletes row >>>>>>>>>>>>> 5 and 10 and pd2 deletes row 15. >>>>>>>>>>>>> 1. a new snapshot is committed with pd1 (DELETED), pd2 >>>>>>>>>>>>> (EXISTING), and pd3 (ADDED). pd3 deletes only row 10 (undeleted >>>>>>>>>>>>> row 5) >>>>>>>>>>>>> 2. a new snapshot is committed with pd1 (DELETED) and pd2 >>>>>>>>>>>>> (EXISTING) >>>>>>>>>>>>> >>>>>>>>>>>>> In either case, essentially some rows are added (back) to the >>>>>>>>>>>>> table with lower sequence number than the new snapshot's sequence >>>>>>>>>>>>> number. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Just to recap the question: should the spec (v2 and v3) spell >>>>>>>>>>>>> out that `undelete row` is not allowed? Rows should only be added >>>>>>>>>>>>> via new >>>>>>>>>>>>> data files. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Nov 21, 2025 at 1:09 PM Steven Wu < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >Are we specifically stating somewhere that all row-ids >>>>>>>>>>>>>> should be higher than or equal to the snapshot's `first-row-id`? >>>>>>>>>>>>>> In my mental model the `first-row-id` is only applicable for >>>>>>>>>>>>>> rows that don't have a specific row-id assigned. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I meant an ADDED row should have `row-id` higher than or >>>>>>>>>>>>>> equal to the snapshot's `first-row-id`. EXISTING or UPDATED row >>>>>>>>>>>>>> can have >>>>>>>>>>>>>> lower row id. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Nov 21, 2025 at 1:04 PM Steven Wu < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> > Can we create a validator to prevent this from happening? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We don't have this problem with the Java implementation. >>>>>>>>>>>>>>> `BaseDVFileWriter` merges the previous DV with the new delta >>>>>>>>>>>>>>> DV. So there >>>>>>>>>>>>>>> is no `undelete` behavior. I am not aware of any Java API to >>>>>>>>>>>>>>> allow >>>>>>>>>>>>>>> "undelete". So we probably don't need to add any validation >>>>>>>>>>>>>>> code in the >>>>>>>>>>>>>>> Java impl. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Just thought it is good to spell it out in the spec so that >>>>>>>>>>>>>>> clients/engines can be clear about the expected behavior. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Nov 21, 2025 at 12:18 PM Péter Váry < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Are we specifically stating somewhere that all row-ids >>>>>>>>>>>>>>>> should be higher than or equal to the snapshot's >>>>>>>>>>>>>>>> `first-row-id`? >>>>>>>>>>>>>>>> In my mental model the `first-row-id` is only applicable >>>>>>>>>>>>>>>> for rows that don't have a specific row-id assigned. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Noneless, I agree that the `row-id` and the >>>>>>>>>>>>>>>> `last-updated-seq-num` should have changed to a new one, so we >>>>>>>>>>>>>>>> can say that >>>>>>>>>>>>>>>> undeleting a row is not allowed because of this. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Can we create a validator to prevent this from happening? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2025. >>>>>>>>>>>>>>>> nov. 21., P, 21:11): >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The undeleted row would have invalid `row-id` and >>>>>>>>>>>>>>>>> `last-updated-seq-num`. Since it is a new row (added back), >>>>>>>>>>>>>>>>> it should have >>>>>>>>>>>>>>>>> the `row-id` higher than or equal to the snapshot's >>>>>>>>>>>>>>>>> `first-row-id` and the >>>>>>>>>>>>>>>>> `last-updated-seq-number` should inherit/have the new >>>>>>>>>>>>>>>>> snapshot's sequence >>>>>>>>>>>>>>>>> number. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Nov 21, 2025 at 11:48 AM Steven Wu < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Should we clarify the V3 spec to explicitly formid " >>>>>>>>>>>>>>>>>> *undelete*" of a row by unsetting the DV bit? Unsetting >>>>>>>>>>>>>>>>>> a DV bit essentially adds a row with lower row-id than the >>>>>>>>>>>>>>>>>> snapshot's >>>>>>>>>>>>>>>>>> first-row-id, which would violate the row lineage spec. With >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> restriction, DV cardinality should be monotonically >>>>>>>>>>>>>>>>>> increasing. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Steven >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>
