Re: [DISCUSS] V3 spec: add monotonic requirement to data DV

Steven Wu Tue, 02 Dec 2025 14:53:25 -0800

Let's look at the following scenario

* Snapshot 10 (first-row-id: 100)
  - A new data file was added and it contains row X. Row X inherits row-id
as 105 and last-updated-sequence-number as 10
* Snapshot 11 (first-row-id: 200)
  - Row X was deleted via DV
* Snapshot 12 (first-row-id: 300)
  - Row X was restored (added back) by rewriting DV and with the delete
position unset.


When querying the table after snapshot 12, the Row X would have the row-id
as 105 and last-updated-sequence-number as 10 (just as the initial add at
snapshot 10). The correct last-updated-sequence-number should be 12 and
row-id should be >=300 for added/restored row X.

Hence, we are proposing that it is invalid to restore a row by rewriting
the DV or position delete file and unsetting the delete position.

> But if a data file has all rows that have 'row-id' set and
'last_updated_sequence_number' unset, technically this can be a valid
undelete, is it right?

Szehon, I didn't quite understand this question. Can you elaborate a bit?




On Tue, Dec 2, 2025 at 2:12 PM Szehon Ho <[email protected]> wrote:

> Hi,
>
> Sorry, I re-read the thread and Peter's question more closely, and wanted
> to explore that we are not precluding something unnecessarily, and if we
> can solve the code problem in other ways.
>
> The concern is that in the 'undeleted' row, the row_id and
> last_updated_seq_number are wrong.
>
>    - If 'row-id' is not set, it inherits a row-id that is changed, which
>    is wrong
>    - If 'last_updated_sequence_number' is set, then it is wrong because
>    it should refer to the snapshot that 'undeleted it'.
>
> Is that correct?
>
> But if a data file has all rows that have 'row-id' set and
> 'last_updated_sequence_number' unset, technically this can be a valid
> undelete, is it right?
>
> Thanks
> Szehon
>
> On Mon, Dec 1, 2025 at 11:08 AM Steven Wu <[email protected]> wrote:
>
>>
>> > _row_id a unique long identifier for every row within the table. The
>> value is assigned via inheritance when a row is first added to the table.
>>
>> Actually, current spec doesn't allow explicitly assigning row-id for new
>> rows.
>>
>> So currently we don't need to worry about the question if it is allowed
>> to have *new* rows with explicitly assigned row-id values lower than the
>> snapshot's first-row-id.
>>
>> On Mon, Dec 1, 2025 at 9:50 AM Steven Wu <[email protected]> wrote:
>>
>>> Here is the spec PR to clarify undelete is not allowed. Will start a
>>> vote thread for that.
>>> https://github.com/apache/iceberg/pull/14731
>>>
>>> Let me start a new discussion thread for the first-row-id and row-id
>>> question for row lineage to get more attention and input.
>>>
>>> On Sat, Nov 22, 2025 at 7:02 AM Péter Váry <[email protected]>
>>> wrote:
>>>
>>>> Apologies if I was unclear. As Steven also mentioned, I wanted to
>>>> confirm whether we agree on the clarification regarding the `row-id` and
>>>> `first-row-id`.
>>>>
>>>> Steven Wu <[email protected]> ezt írta (időpont: 2025. nov. 22.,
>>>> Szo, 15:28):
>>>>
>>>>> Just to clarify, I was asking a question.
>>>>>
>>>>> Is it valid to add a new data file with a row?
>>>>>
>>>>>    - whose persisted row-id value is lower than the snapshot's
>>>>>    first-row-id
>>>>>    - whose last-updated-seq-number is not set and inherit from the
>>>>>    snapshot sequence number
>>>>>
>>>>> Thanks,
>>>>> Steven
>>>>>
>>>>> On Fri, Nov 21, 2025 at 11:25 PM Péter Váry <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> +1 for this proposal
>>>>>>
>>>>>> Slightly related, but we can move this to a separate thread if it
>>>>>> needs independent discussion: We should clarify the relationship between
>>>>>> `row-id` and `first-row-id`. This has come up several times in our
>>>>>> discussions about the equality delete removal proposal, where we 
>>>>>> considered
>>>>>> generating `row-ids` manually instead of relying on the auto-assignment
>>>>>> feature.
>>>>>>
>>>>>> As discussed with Steven:
>>>>>>
>>>>>>> It is valid to add a new data file with a row:
>>>>>>>
>>>>>>>    - whose persisted row-id value is lower than the snapshot's
>>>>>>>    first-row-id
>>>>>>>    - whose last-updated-seq-number is not set and inherit from the
>>>>>>>    snapshot sequence number
>>>>>>>
>>>>>>>
>>>>>> Prashant Singh <[email protected]> ezt írta (időpont: 2025.
>>>>>> nov. 22., Szo, 5:29):
>>>>>>
>>>>>>> +1 for making it explicit that an *undelete *of a row can't be done
>>>>>>> by unsetting the corresponding bit in DV
>>>>>>>
>>>>>>> *Rows should only be added via new data files*, sounds reasonable
>>>>>>> to me !
>>>>>>>
>>>>>>> apart from row-lineage it also complicates the operation type
>>>>>>> inference like here [1] as we would now
>>>>>>> inspect the contents of these DV to see if it's an insert ?
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/apache/iceberg/pull/14581#discussion_r2533057189
>>>>>>>
>>>>>>> On Sat, Nov 22, 2025 at 4:48 AM Szehon Ho <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It makes sense to me, it sounds like a minor clarification.  For v2
>>>>>>>> position deletes, code like rewrite_position_deletes may have made some
>>>>>>>> assumptions like this and would not work well if violated, maybe other 
>>>>>>>> code
>>>>>>>> as well.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Szehon
>>>>>>>>
>>>>>>>> On Fri, Nov 21, 2025 at 3:03 PM Steven Wu <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Similar weird behavior can also happen for V2 position delete
>>>>>>>>> files with `undelete`.
>>>>>>>>>
>>>>>>>>> In V2, there could be multiple position delete files (say pd1,
>>>>>>>>> pd2) associated with the same data file (say f1). Let's say pd1 
>>>>>>>>> deletes row
>>>>>>>>> 5 and 10 and pd2 deletes row 15.
>>>>>>>>> 1. a new snapshot is committed with pd1 (DELETED), pd2 (EXISTING),
>>>>>>>>> and pd3 (ADDED). pd3 deletes only row 10 (undeleted row 5)
>>>>>>>>> 2. a new snapshot is committed with pd1 (DELETED) and pd2
>>>>>>>>> (EXISTING)
>>>>>>>>>
>>>>>>>>> In either case, essentially some rows are added (back) to the
>>>>>>>>> table with lower sequence number than the new snapshot's sequence 
>>>>>>>>> number.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Just to recap the question: should the spec (v2 and v3) spell out
>>>>>>>>> that `undelete row` is not allowed? Rows should only be added via new 
>>>>>>>>> data
>>>>>>>>> files.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Nov 21, 2025 at 1:09 PM Steven Wu <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> >Are we specifically stating somewhere that all row-ids should be
>>>>>>>>>> higher than or equal to the snapshot's `first-row-id`?
>>>>>>>>>> In my mental model the `first-row-id` is only applicable for rows
>>>>>>>>>> that don't have a specific row-id assigned.
>>>>>>>>>>
>>>>>>>>>> I meant an ADDED row should have `row-id` higher than or equal to
>>>>>>>>>> the snapshot's `first-row-id`. EXISTING or UPDATED row can have 
>>>>>>>>>> lower row
>>>>>>>>>> id.
>>>>>>>>>>
>>>>>>>>>> On Fri, Nov 21, 2025 at 1:04 PM Steven Wu <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> > Can we create a validator to prevent this from happening?
>>>>>>>>>>>
>>>>>>>>>>> We don't have this problem with the Java implementation.
>>>>>>>>>>> `BaseDVFileWriter` merges the  previous DV with the new delta DV. 
>>>>>>>>>>> So there
>>>>>>>>>>> is no `undelete` behavior. I am not aware of any Java API to allow
>>>>>>>>>>> "undelete". So we probably don't need to add any validation code in 
>>>>>>>>>>> the
>>>>>>>>>>> Java impl.
>>>>>>>>>>>
>>>>>>>>>>> Just thought it is good to spell it out in the spec so that
>>>>>>>>>>> clients/engines can be clear about the expected behavior.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Nov 21, 2025 at 12:18 PM Péter Váry <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Are we specifically stating somewhere that all row-ids should
>>>>>>>>>>>> be higher than or equal to the snapshot's `first-row-id`?
>>>>>>>>>>>> In my mental model the `first-row-id` is only applicable for
>>>>>>>>>>>> rows that don't have a specific row-id assigned.
>>>>>>>>>>>>
>>>>>>>>>>>> Noneless, I agree that the `row-id` and the
>>>>>>>>>>>> `last-updated-seq-num` should have changed to a new one, so we can 
>>>>>>>>>>>> say that
>>>>>>>>>>>> undeleting a row is not allowed because of this.
>>>>>>>>>>>>
>>>>>>>>>>>> Can we create a validator to prevent this from happening?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2025. nov.
>>>>>>>>>>>> 21., P, 21:11):
>>>>>>>>>>>>
>>>>>>>>>>>>> The undeleted row would have invalid `row-id` and
>>>>>>>>>>>>> `last-updated-seq-num`. Since it is a new row (added back), it 
>>>>>>>>>>>>> should have
>>>>>>>>>>>>> the `row-id` higher than or equal to the snapshot's 
>>>>>>>>>>>>> `first-row-id` and the
>>>>>>>>>>>>> `last-updated-seq-number` should inherit/have the new snapshot's 
>>>>>>>>>>>>> sequence
>>>>>>>>>>>>> number.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Nov 21, 2025 at 11:48 AM Steven Wu <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Should we clarify the V3 spec to explicitly formid "
>>>>>>>>>>>>>> *undelete*" of a row by unsetting the DV bit? Unsetting a DV
>>>>>>>>>>>>>> bit essentially adds a row with lower row-id than the snapshot's
>>>>>>>>>>>>>> first-row-id, which would violate the row lineage spec. With the
>>>>>>>>>>>>>> restriction, DV cardinality should be monotonically increasing.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

Re: [DISCUSS] V3 spec: add monotonic requirement to data DV

Reply via email to