Re: [DISCUSS] V3 spec: add monotonic requirement to data DV

Anoop Johnson Tue, 02 Dec 2025 23:25:36 -0800

I recommend not adding this restriction to the spec for two reasons.

1. When a row is restored, it is desirable that the row ID is restored as
well. So this sounds like a feature and not a bug to me.
2. Table RESTOREs are a commonly used feature, and it would be
prohibitively expensive to rewrite data files to restore deleted rows back
into the table.


Best,
Anoop

On Tue, Dec 2, 2025 at 4:56 PM Szehon Ho <[email protected]> wrote:

> Szehon, I didn't quite understand this question. Can you elaborate a bit?
>
>
> Yea I was wondering in the scenario you are discussing above, a new file:
>
>>
>>    - whose persisted row-id value is lower than the snapshot's
>>    first-row-id
>>
>>
>>    - whose last-updated-seq-number is not set and inherit from the
>>    snapshot sequence number
>>
>> I saw your interpretation though that it is not explicitly allowed.
>
> Overall, I was just trying to reason wondering whether its beneficial to
> disallow a quick un-delete in the scenario that you describe, due to the
> difficulty of implementing the row-lineage and other things, as the
> scenario is not really a violation of the current row-lineage spec as
> initially stated, but definitely troublesome.
>
> Thanks,
> Szehon
>
> On Tue, Dec 2, 2025 at 2:52 PM Steven Wu <[email protected]> wrote:
>
>> Let's look at the following scenario
>>
>> * Snapshot 10 (first-row-id: 100)
>>   - A new data file was added and it contains row X. Row X inherits
>> row-id as 105 and last-updated-sequence-number as 10
>> * Snapshot 11 (first-row-id: 200)
>>   - Row X was deleted via DV
>> * Snapshot 12 (first-row-id: 300)
>>   - Row X was restored (added back) by rewriting DV and with the delete
>> position unset.
>>
>> When querying the table after snapshot 12, the Row X would have the
>> row-id as 105 and last-updated-sequence-number as 10 (just as the initial
>> add at snapshot 10). The correct last-updated-sequence-number should be 12
>> and row-id should be >=300 for added/restored row X.
>>
>> Hence, we are proposing that it is invalid to restore a row by rewriting
>> the DV or position delete file and unsetting the delete position.
>>
>> > But if a data file has all rows that have 'row-id' set and
>> 'last_updated_sequence_number' unset, technically this can be a valid
>> undelete, is it right?
>>
>> Szehon, I didn't quite understand this question. Can you elaborate a bit?
>>
>>
>>
>>
>> On Tue, Dec 2, 2025 at 2:12 PM Szehon Ho <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> Sorry, I re-read the thread and Peter's question more closely, and
>>> wanted to explore that we are not precluding something unnecessarily, and
>>> if we can solve the code problem in other ways.
>>>
>>> The concern is that in the 'undeleted' row, the row_id and
>>> last_updated_seq_number are wrong.
>>>
>>>    - If 'row-id' is not set, it inherits a row-id that is changed,
>>>    which is wrong
>>>    - If 'last_updated_sequence_number' is set, then it is wrong because
>>>    it should refer to the snapshot that 'undeleted it'.
>>>
>>> Is that correct?
>>>
>>> But if a data file has all rows that have 'row-id' set and
>>> 'last_updated_sequence_number' unset, technically this can be a valid
>>> undelete, is it right?
>>>
>>> Thanks
>>> Szehon
>>>
>>> On Mon, Dec 1, 2025 at 11:08 AM Steven Wu <[email protected]> wrote:
>>>
>>>>
>>>> > _row_id a unique long identifier for every row within the table. The
>>>> value is assigned via inheritance when a row is first added to the table.
>>>>
>>>> Actually, current spec doesn't allow explicitly assigning row-id for
>>>> new rows.
>>>>
>>>> So currently we don't need to worry about the question if it is allowed
>>>> to have *new* rows with explicitly assigned row-id values lower than
>>>> the snapshot's first-row-id.
>>>>
>>>> On Mon, Dec 1, 2025 at 9:50 AM Steven Wu <[email protected]> wrote:
>>>>
>>>>> Here is the spec PR to clarify undelete is not allowed. Will start a
>>>>> vote thread for that.
>>>>> https://github.com/apache/iceberg/pull/14731
>>>>>
>>>>> Let me start a new discussion thread for the first-row-id and row-id
>>>>> question for row lineage to get more attention and input.
>>>>>
>>>>> On Sat, Nov 22, 2025 at 7:02 AM Péter Váry <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Apologies if I was unclear. As Steven also mentioned, I wanted to
>>>>>> confirm whether we agree on the clarification regarding the `row-id` and
>>>>>> `first-row-id`.
>>>>>>
>>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2025. nov. 22.,
>>>>>> Szo, 15:28):
>>>>>>
>>>>>>> Just to clarify, I was asking a question.
>>>>>>>
>>>>>>> Is it valid to add a new data file with a row?
>>>>>>>
>>>>>>>    - whose persisted row-id value is lower than the snapshot's
>>>>>>>    first-row-id
>>>>>>>    - whose last-updated-seq-number is not set and inherit from the
>>>>>>>    snapshot sequence number
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Steven
>>>>>>>
>>>>>>> On Fri, Nov 21, 2025 at 11:25 PM Péter Váry <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> +1 for this proposal
>>>>>>>>
>>>>>>>> Slightly related, but we can move this to a separate thread if it
>>>>>>>> needs independent discussion: We should clarify the relationship 
>>>>>>>> between
>>>>>>>> `row-id` and `first-row-id`. This has come up several times in our
>>>>>>>> discussions about the equality delete removal proposal, where we 
>>>>>>>> considered
>>>>>>>> generating `row-ids` manually instead of relying on the auto-assignment
>>>>>>>> feature.
>>>>>>>>
>>>>>>>> As discussed with Steven:
>>>>>>>>
>>>>>>>>> It is valid to add a new data file with a row:
>>>>>>>>>
>>>>>>>>>    - whose persisted row-id value is lower than the snapshot's
>>>>>>>>>    first-row-id
>>>>>>>>>    - whose last-updated-seq-number is not set and inherit from
>>>>>>>>>    the snapshot sequence number
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Prashant Singh <[email protected]> ezt írta (időpont: 2025.
>>>>>>>> nov. 22., Szo, 5:29):
>>>>>>>>
>>>>>>>>> +1 for making it explicit that an *undelete *of a row can't be
>>>>>>>>> done by unsetting the corresponding bit in DV
>>>>>>>>>
>>>>>>>>> *Rows should only be added via new data files*, sounds reasonable
>>>>>>>>> to me !
>>>>>>>>>
>>>>>>>>> apart from row-lineage it also complicates the operation type
>>>>>>>>> inference like here [1] as we would now
>>>>>>>>> inspect the contents of these DV to see if it's an insert ?
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://github.com/apache/iceberg/pull/14581#discussion_r2533057189
>>>>>>>>>
>>>>>>>>> On Sat, Nov 22, 2025 at 4:48 AM Szehon Ho <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> It makes sense to me, it sounds like a minor clarification.  For
>>>>>>>>>> v2 position deletes, code like rewrite_position_deletes may have 
>>>>>>>>>> made some
>>>>>>>>>> assumptions like this and would not work well if violated, maybe 
>>>>>>>>>> other code
>>>>>>>>>> as well.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Szehon
>>>>>>>>>>
>>>>>>>>>> On Fri, Nov 21, 2025 at 3:03 PM Steven Wu <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Similar weird behavior can also happen for V2 position delete
>>>>>>>>>>> files with `undelete`.
>>>>>>>>>>>
>>>>>>>>>>> In V2, there could be multiple position delete files (say pd1,
>>>>>>>>>>> pd2) associated with the same data file (say f1). Let's say pd1 
>>>>>>>>>>> deletes row
>>>>>>>>>>> 5 and 10 and pd2 deletes row 15.
>>>>>>>>>>> 1. a new snapshot is committed with pd1 (DELETED), pd2
>>>>>>>>>>> (EXISTING), and pd3 (ADDED). pd3 deletes only row 10 (undeleted row 
>>>>>>>>>>> 5)
>>>>>>>>>>> 2. a new snapshot is committed with pd1 (DELETED) and pd2
>>>>>>>>>>> (EXISTING)
>>>>>>>>>>>
>>>>>>>>>>> In either case, essentially some rows are added (back) to the
>>>>>>>>>>> table with lower sequence number than the new snapshot's sequence 
>>>>>>>>>>> number.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Just to recap the question: should the spec (v2 and v3) spell
>>>>>>>>>>> out that `undelete row` is not allowed? Rows should only be added 
>>>>>>>>>>> via new
>>>>>>>>>>> data files.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Nov 21, 2025 at 1:09 PM Steven Wu <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> >Are we specifically stating somewhere that all row-ids should
>>>>>>>>>>>> be higher than or equal to the snapshot's `first-row-id`?
>>>>>>>>>>>> In my mental model the `first-row-id` is only applicable for
>>>>>>>>>>>> rows that don't have a specific row-id assigned.
>>>>>>>>>>>>
>>>>>>>>>>>> I meant an ADDED row should have `row-id` higher than or equal
>>>>>>>>>>>> to the snapshot's `first-row-id`. EXISTING or UPDATED row can have 
>>>>>>>>>>>> lower
>>>>>>>>>>>> row id.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Nov 21, 2025 at 1:04 PM Steven Wu <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> > Can we create a validator to prevent this from happening?
>>>>>>>>>>>>>
>>>>>>>>>>>>> We don't have this problem with the Java implementation.
>>>>>>>>>>>>> `BaseDVFileWriter` merges the  previous DV with the new delta DV. 
>>>>>>>>>>>>> So there
>>>>>>>>>>>>> is no `undelete` behavior. I am not aware of any Java API to allow
>>>>>>>>>>>>> "undelete". So we probably don't need to add any validation code 
>>>>>>>>>>>>> in the
>>>>>>>>>>>>> Java impl.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Just thought it is good to spell it out in the spec so that
>>>>>>>>>>>>> clients/engines can be clear about the expected behavior.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Nov 21, 2025 at 12:18 PM Péter Váry <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Are we specifically stating somewhere that all row-ids should
>>>>>>>>>>>>>> be higher than or equal to the snapshot's `first-row-id`?
>>>>>>>>>>>>>> In my mental model the `first-row-id` is only applicable for
>>>>>>>>>>>>>> rows that don't have a specific row-id assigned.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Noneless, I agree that the `row-id` and the
>>>>>>>>>>>>>> `last-updated-seq-num` should have changed to a new one, so we 
>>>>>>>>>>>>>> can say that
>>>>>>>>>>>>>> undeleting a row is not allowed because of this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can we create a validator to prevent this from happening?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2025.
>>>>>>>>>>>>>> nov. 21., P, 21:11):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The undeleted row would have invalid `row-id` and
>>>>>>>>>>>>>>> `last-updated-seq-num`. Since it is a new row (added back), it 
>>>>>>>>>>>>>>> should have
>>>>>>>>>>>>>>> the `row-id` higher than or equal to the snapshot's 
>>>>>>>>>>>>>>> `first-row-id` and the
>>>>>>>>>>>>>>> `last-updated-seq-number` should inherit/have the new 
>>>>>>>>>>>>>>> snapshot's sequence
>>>>>>>>>>>>>>> number.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Nov 21, 2025 at 11:48 AM Steven Wu <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Should we clarify the V3 spec to explicitly formid "
>>>>>>>>>>>>>>>> *undelete*" of a row by unsetting the DV bit? Unsetting a
>>>>>>>>>>>>>>>> DV bit essentially adds a row with lower row-id than the 
>>>>>>>>>>>>>>>> snapshot's
>>>>>>>>>>>>>>>> first-row-id, which would violate the row lineage spec. With 
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> restriction, DV cardinality should be monotonically increasing.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>

Re: [DISCUSS] V3 spec: add monotonic requirement to data DV

Reply via email to