Re: [DISCUSS] V3 spec: add monotonic requirement to data DV

Steven Wu Wed, 03 Dec 2025 07:42:33 -0800

>  the last-updated-seq-number should reflect the time of restoration.

Like Peter said, this is the main point why row restoration shouldn't be
allowed.


Incremental processing requires the last-updated-sequence-number to reflect
the snapshot when a row is inserted or updated. If the restored row's
last-updated-seq-number still inherits the old value when they were
originally inserted, incremental processing can break.

> Table RESTOREs are a commonly used feature

I view table restore as a rollback operation in Iceberg. It can also bring
back deleted rows. But users should expect broken row lineage when a table
is reset to an old state.





On Wed, Dec 3, 2025 at 2:22 AM Péter Váry <[email protected]>
wrote:

> > 1. When a row is restored, it is desirable that the row ID is restored
> as well. So this sounds like a feature and not a bug to me.
>
> Is “restored” even a valid concept? Most systems only support adding and
> deleting rows; restoration is typically treated as inserting a new row,
> which should receive a new ID.
> Even if we accept the notion of restoration, I agree with Steven’s point:
> the last-updated-seq-number should reflect the time of restoration. Simply
> removing the delete flag from the original row would be an invalid
> operation.
>
>  > 2. Table RESTOREs are a commonly used feature, and it would be
> prohibitively expensive to rewrite data files to restore deleted rows
> back into the table.
>
> That's an interesting point, but I think in this specific case, most users
> would likely tolerate lineage corruption. For those who cannot, a full
> table rewrite remains an option.
>
> Anoop Johnson <[email protected]> ezt írta (időpont: 2025. dec. 3., Sze,
> 8:26):
>
>> I recommend not adding this restriction to the spec for two reasons.
>>
>> 1. When a row is restored, it is desirable that the row ID is restored as
>> well. So this sounds like a feature and not a bug to me.
>> 2. Table RESTOREs are a commonly used feature, and it would be
>> prohibitively expensive to rewrite data files to restore deleted rows back
>> into the table.
>>
>> Best,
>> Anoop
>>
>> On Tue, Dec 2, 2025 at 4:56 PM Szehon Ho <[email protected]> wrote:
>>
>>> Szehon, I didn't quite understand this question. Can you elaborate a bit?
>>>
>>>
>>> Yea I was wondering in the scenario you are discussing above, a new file:
>>>
>>>>
>>>>    - whose persisted row-id value is lower than the snapshot's
>>>>    first-row-id
>>>>
>>>>
>>>>    - whose last-updated-seq-number is not set and inherit from the
>>>>    snapshot sequence number
>>>>
>>>> I saw your interpretation though that it is not explicitly allowed.
>>>
>>> Overall, I was just trying to reason wondering whether its beneficial to
>>> disallow a quick un-delete in the scenario that you describe, due to the
>>> difficulty of implementing the row-lineage and other things, as the
>>> scenario is not really a violation of the current row-lineage spec as
>>> initially stated, but definitely troublesome.
>>>
>>> Thanks,
>>> Szehon
>>>
>>> On Tue, Dec 2, 2025 at 2:52 PM Steven Wu <[email protected]> wrote:
>>>
>>>> Let's look at the following scenario
>>>>
>>>> * Snapshot 10 (first-row-id: 100)
>>>>   - A new data file was added and it contains row X. Row X inherits
>>>> row-id as 105 and last-updated-sequence-number as 10
>>>> * Snapshot 11 (first-row-id: 200)
>>>>   - Row X was deleted via DV
>>>> * Snapshot 12 (first-row-id: 300)
>>>>   - Row X was restored (added back) by rewriting DV and with the delete
>>>> position unset.
>>>>
>>>> When querying the table after snapshot 12, the Row X would have the
>>>> row-id as 105 and last-updated-sequence-number as 10 (just as the initial
>>>> add at snapshot 10). The correct last-updated-sequence-number should be 12
>>>> and row-id should be >=300 for added/restored row X.
>>>>
>>>> Hence, we are proposing that it is invalid to restore a row by
>>>> rewriting the DV or position delete file and unsetting the delete position.
>>>>
>>>> > But if a data file has all rows that have 'row-id' set and
>>>> 'last_updated_sequence_number' unset, technically this can be a valid
>>>> undelete, is it right?
>>>>
>>>> Szehon, I didn't quite understand this question. Can you elaborate a
>>>> bit?
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Dec 2, 2025 at 2:12 PM Szehon Ho <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Sorry, I re-read the thread and Peter's question more closely, and
>>>>> wanted to explore that we are not precluding something unnecessarily, and
>>>>> if we can solve the code problem in other ways.
>>>>>
>>>>> The concern is that in the 'undeleted' row, the row_id and
>>>>> last_updated_seq_number are wrong.
>>>>>
>>>>>    - If 'row-id' is not set, it inherits a row-id that is changed,
>>>>>    which is wrong
>>>>>    - If 'last_updated_sequence_number' is set, then it is wrong
>>>>>    because it should refer to the snapshot that 'undeleted it'.
>>>>>
>>>>> Is that correct?
>>>>>
>>>>> But if a data file has all rows that have 'row-id' set and
>>>>> 'last_updated_sequence_number' unset, technically this can be a valid
>>>>> undelete, is it right?
>>>>>
>>>>> Thanks
>>>>> Szehon
>>>>>
>>>>> On Mon, Dec 1, 2025 at 11:08 AM Steven Wu <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> > _row_id a unique long identifier for every row within the table.
>>>>>> The value is assigned via inheritance when a row is first added to the
>>>>>> table.
>>>>>>
>>>>>> Actually, current spec doesn't allow explicitly assigning row-id for
>>>>>> new rows.
>>>>>>
>>>>>> So currently we don't need to worry about the question if it is
>>>>>> allowed to have *new* rows with explicitly assigned row-id values
>>>>>> lower than the snapshot's first-row-id.
>>>>>>
>>>>>> On Mon, Dec 1, 2025 at 9:50 AM Steven Wu <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Here is the spec PR to clarify undelete is not allowed. Will start a
>>>>>>> vote thread for that.
>>>>>>> https://github.com/apache/iceberg/pull/14731
>>>>>>>
>>>>>>> Let me start a new discussion thread for the first-row-id and row-id
>>>>>>> question for row lineage to get more attention and input.
>>>>>>>
>>>>>>> On Sat, Nov 22, 2025 at 7:02 AM Péter Váry <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Apologies if I was unclear. As Steven also mentioned, I wanted to
>>>>>>>> confirm whether we agree on the clarification regarding the `row-id` 
>>>>>>>> and
>>>>>>>> `first-row-id`.
>>>>>>>>
>>>>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2025. nov.
>>>>>>>> 22., Szo, 15:28):
>>>>>>>>
>>>>>>>>> Just to clarify, I was asking a question.
>>>>>>>>>
>>>>>>>>> Is it valid to add a new data file with a row?
>>>>>>>>>
>>>>>>>>>    - whose persisted row-id value is lower than the snapshot's
>>>>>>>>>    first-row-id
>>>>>>>>>    - whose last-updated-seq-number is not set and inherit from
>>>>>>>>>    the snapshot sequence number
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Steven
>>>>>>>>>
>>>>>>>>> On Fri, Nov 21, 2025 at 11:25 PM Péter Váry <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> +1 for this proposal
>>>>>>>>>>
>>>>>>>>>> Slightly related, but we can move this to a separate thread if it
>>>>>>>>>> needs independent discussion: We should clarify the relationship 
>>>>>>>>>> between
>>>>>>>>>> `row-id` and `first-row-id`. This has come up several times in our
>>>>>>>>>> discussions about the equality delete removal proposal, where we 
>>>>>>>>>> considered
>>>>>>>>>> generating `row-ids` manually instead of relying on the 
>>>>>>>>>> auto-assignment
>>>>>>>>>> feature.
>>>>>>>>>>
>>>>>>>>>> As discussed with Steven:
>>>>>>>>>>
>>>>>>>>>>> It is valid to add a new data file with a row:
>>>>>>>>>>>
>>>>>>>>>>>    - whose persisted row-id value is lower than the snapshot's
>>>>>>>>>>>    first-row-id
>>>>>>>>>>>    - whose last-updated-seq-number is not set and inherit from
>>>>>>>>>>>    the snapshot sequence number
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Prashant Singh <[email protected]> ezt írta (időpont:
>>>>>>>>>> 2025. nov. 22., Szo, 5:29):
>>>>>>>>>>
>>>>>>>>>>> +1 for making it explicit that an *undelete *of a row can't be
>>>>>>>>>>> done by unsetting the corresponding bit in DV
>>>>>>>>>>>
>>>>>>>>>>> *Rows should only be added via new data files*, sounds
>>>>>>>>>>> reasonable to me !
>>>>>>>>>>>
>>>>>>>>>>> apart from row-lineage it also complicates the operation type
>>>>>>>>>>> inference like here [1] as we would now
>>>>>>>>>>> inspect the contents of these DV to see if it's an insert ?
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://github.com/apache/iceberg/pull/14581#discussion_r2533057189
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Nov 22, 2025 at 4:48 AM Szehon Ho <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> It makes sense to me, it sounds like a minor clarification.
>>>>>>>>>>>> For v2 position deletes, code like rewrite_position_deletes may 
>>>>>>>>>>>> have made
>>>>>>>>>>>> some assumptions like this and would not work well if violated, 
>>>>>>>>>>>> maybe other
>>>>>>>>>>>> code as well.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Szehon
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Nov 21, 2025 at 3:03 PM Steven Wu <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Similar weird behavior can also happen for V2 position delete
>>>>>>>>>>>>> files with `undelete`.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In V2, there could be multiple position delete files (say pd1,
>>>>>>>>>>>>> pd2) associated with the same data file (say f1). Let's say pd1 
>>>>>>>>>>>>> deletes row
>>>>>>>>>>>>> 5 and 10 and pd2 deletes row 15.
>>>>>>>>>>>>> 1. a new snapshot is committed with pd1 (DELETED), pd2
>>>>>>>>>>>>> (EXISTING), and pd3 (ADDED). pd3 deletes only row 10 (undeleted 
>>>>>>>>>>>>> row 5)
>>>>>>>>>>>>> 2. a new snapshot is committed with pd1 (DELETED) and pd2
>>>>>>>>>>>>> (EXISTING)
>>>>>>>>>>>>>
>>>>>>>>>>>>> In either case, essentially some rows are added (back) to the
>>>>>>>>>>>>> table with lower sequence number than the new snapshot's sequence 
>>>>>>>>>>>>> number.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Just to recap the question: should the spec (v2 and v3) spell
>>>>>>>>>>>>> out that `undelete row` is not allowed? Rows should only be added 
>>>>>>>>>>>>> via new
>>>>>>>>>>>>> data files.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Nov 21, 2025 at 1:09 PM Steven Wu <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> >Are we specifically stating somewhere that all row-ids
>>>>>>>>>>>>>> should be higher than or equal to the snapshot's `first-row-id`?
>>>>>>>>>>>>>> In my mental model the `first-row-id` is only applicable for
>>>>>>>>>>>>>> rows that don't have a specific row-id assigned.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I meant an ADDED row should have `row-id` higher than or
>>>>>>>>>>>>>> equal to the snapshot's `first-row-id`. EXISTING or UPDATED row 
>>>>>>>>>>>>>> can have
>>>>>>>>>>>>>> lower row id.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Nov 21, 2025 at 1:04 PM Steven Wu <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > Can we create a validator to prevent this from happening?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We don't have this problem with the Java implementation.
>>>>>>>>>>>>>>> `BaseDVFileWriter` merges the  previous DV with the new delta 
>>>>>>>>>>>>>>> DV. So there
>>>>>>>>>>>>>>> is no `undelete` behavior. I am not aware of any Java API to 
>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>> "undelete". So we probably don't need to add any validation 
>>>>>>>>>>>>>>> code in the
>>>>>>>>>>>>>>> Java impl.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Just thought it is good to spell it out in the spec so that
>>>>>>>>>>>>>>> clients/engines can be clear about the expected behavior.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Nov 21, 2025 at 12:18 PM Péter Váry <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Are we specifically stating somewhere that all row-ids
>>>>>>>>>>>>>>>> should be higher than or equal to the snapshot's 
>>>>>>>>>>>>>>>> `first-row-id`?
>>>>>>>>>>>>>>>> In my mental model the `first-row-id` is only applicable
>>>>>>>>>>>>>>>> for rows that don't have a specific row-id assigned.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Noneless, I agree that the `row-id` and the
>>>>>>>>>>>>>>>> `last-updated-seq-num` should have changed to a new one, so we 
>>>>>>>>>>>>>>>> can say that
>>>>>>>>>>>>>>>> undeleting a row is not allowed because of this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can we create a validator to prevent this from happening?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2025.
>>>>>>>>>>>>>>>> nov. 21., P, 21:11):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The undeleted row would have invalid `row-id` and
>>>>>>>>>>>>>>>>> `last-updated-seq-num`. Since it is a new row (added back), 
>>>>>>>>>>>>>>>>> it should have
>>>>>>>>>>>>>>>>> the `row-id` higher than or equal to the snapshot's 
>>>>>>>>>>>>>>>>> `first-row-id` and the
>>>>>>>>>>>>>>>>> `last-updated-seq-number` should inherit/have the new 
>>>>>>>>>>>>>>>>> snapshot's sequence
>>>>>>>>>>>>>>>>> number.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Nov 21, 2025 at 11:48 AM Steven Wu <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Should we clarify the V3 spec to explicitly formid "
>>>>>>>>>>>>>>>>>> *undelete*" of a row by unsetting the DV bit? Unsetting
>>>>>>>>>>>>>>>>>> a DV bit essentially adds a row with lower row-id than the 
>>>>>>>>>>>>>>>>>> snapshot's
>>>>>>>>>>>>>>>>>> first-row-id, which would violate the row lineage spec. With 
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> restriction, DV cardinality should be monotonically 
>>>>>>>>>>>>>>>>>> increasing.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>

Re: [DISCUSS] V3 spec: add monotonic requirement to data DV

Reply via email to