Re: [DISCUSS] V3 spec: add monotonic requirement to data DV

Péter Váry Wed, 03 Dec 2025 02:23:34 -0800

> 1. When a row is restored, it is desirable that the row ID is restored as
well. So this sounds like a feature and not a bug to me.


Is “restored” even a valid concept? Most systems only support adding and
deleting rows; restoration is typically treated as inserting a new row,
which should receive a new ID.
Even if we accept the notion of restoration, I agree with Steven’s point:
the last-updated-seq-number should reflect the time of restoration. Simply
removing the delete flag from the original row would be an invalid
operation.

 > 2. Table RESTOREs are a commonly used feature, and it would be
prohibitively expensive to rewrite data files to restore deleted rows back
into the table.

That's an interesting point, but I think in this specific case, most users
would likely tolerate lineage corruption. For those who cannot, a full
table rewrite remains an option.

Anoop Johnson <[email protected]> ezt írta (időpont: 2025. dec. 3., Sze,
8:26):

> I recommend not adding this restriction to the spec for two reasons.
>
> 1. When a row is restored, it is desirable that the row ID is restored as
> well. So this sounds like a feature and not a bug to me.
> 2. Table RESTOREs are a commonly used feature, and it would be
> prohibitively expensive to rewrite data files to restore deleted rows back
> into the table.
>
> Best,
> Anoop
>
> On Tue, Dec 2, 2025 at 4:56 PM Szehon Ho <[email protected]> wrote:
>
>> Szehon, I didn't quite understand this question. Can you elaborate a bit?
>>
>>
>> Yea I was wondering in the scenario you are discussing above, a new file:
>>
>>>
>>>    - whose persisted row-id value is lower than the snapshot's
>>>    first-row-id
>>>
>>>
>>>    - whose last-updated-seq-number is not set and inherit from the
>>>    snapshot sequence number
>>>
>>> I saw your interpretation though that it is not explicitly allowed.
>>
>> Overall, I was just trying to reason wondering whether its beneficial to
>> disallow a quick un-delete in the scenario that you describe, due to the
>> difficulty of implementing the row-lineage and other things, as the
>> scenario is not really a violation of the current row-lineage spec as
>> initially stated, but definitely troublesome.
>>
>> Thanks,
>> Szehon
>>
>> On Tue, Dec 2, 2025 at 2:52 PM Steven Wu <[email protected]> wrote:
>>
>>> Let's look at the following scenario
>>>
>>> * Snapshot 10 (first-row-id: 100)
>>>   - A new data file was added and it contains row X. Row X inherits
>>> row-id as 105 and last-updated-sequence-number as 10
>>> * Snapshot 11 (first-row-id: 200)
>>>   - Row X was deleted via DV
>>> * Snapshot 12 (first-row-id: 300)
>>>   - Row X was restored (added back) by rewriting DV and with the delete
>>> position unset.
>>>
>>> When querying the table after snapshot 12, the Row X would have the
>>> row-id as 105 and last-updated-sequence-number as 10 (just as the initial
>>> add at snapshot 10). The correct last-updated-sequence-number should be 12
>>> and row-id should be >=300 for added/restored row X.
>>>
>>> Hence, we are proposing that it is invalid to restore a row by rewriting
>>> the DV or position delete file and unsetting the delete position.
>>>
>>> > But if a data file has all rows that have 'row-id' set and
>>> 'last_updated_sequence_number' unset, technically this can be a valid
>>> undelete, is it right?
>>>
>>> Szehon, I didn't quite understand this question. Can you elaborate a bit?
>>>
>>>
>>>
>>>
>>> On Tue, Dec 2, 2025 at 2:12 PM Szehon Ho <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Sorry, I re-read the thread and Peter's question more closely, and
>>>> wanted to explore that we are not precluding something unnecessarily, and
>>>> if we can solve the code problem in other ways.
>>>>
>>>> The concern is that in the 'undeleted' row, the row_id and
>>>> last_updated_seq_number are wrong.
>>>>
>>>>    - If 'row-id' is not set, it inherits a row-id that is changed,
>>>>    which is wrong
>>>>    - If 'last_updated_sequence_number' is set, then it is wrong
>>>>    because it should refer to the snapshot that 'undeleted it'.
>>>>
>>>> Is that correct?
>>>>
>>>> But if a data file has all rows that have 'row-id' set and
>>>> 'last_updated_sequence_number' unset, technically this can be a valid
>>>> undelete, is it right?
>>>>
>>>> Thanks
>>>> Szehon
>>>>
>>>> On Mon, Dec 1, 2025 at 11:08 AM Steven Wu <[email protected]> wrote:
>>>>
>>>>>
>>>>> > _row_id a unique long identifier for every row within the table.
>>>>> The value is assigned via inheritance when a row is first added to the
>>>>> table.
>>>>>
>>>>> Actually, current spec doesn't allow explicitly assigning row-id for
>>>>> new rows.
>>>>>
>>>>> So currently we don't need to worry about the question if it is
>>>>> allowed to have *new* rows with explicitly assigned row-id values
>>>>> lower than the snapshot's first-row-id.
>>>>>
>>>>> On Mon, Dec 1, 2025 at 9:50 AM Steven Wu <[email protected]> wrote:
>>>>>
>>>>>> Here is the spec PR to clarify undelete is not allowed. Will start a
>>>>>> vote thread for that.
>>>>>> https://github.com/apache/iceberg/pull/14731
>>>>>>
>>>>>> Let me start a new discussion thread for the first-row-id and row-id
>>>>>> question for row lineage to get more attention and input.
>>>>>>
>>>>>> On Sat, Nov 22, 2025 at 7:02 AM Péter Váry <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Apologies if I was unclear. As Steven also mentioned, I wanted to
>>>>>>> confirm whether we agree on the clarification regarding the `row-id` and
>>>>>>> `first-row-id`.
>>>>>>>
>>>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2025. nov. 22.,
>>>>>>> Szo, 15:28):
>>>>>>>
>>>>>>>> Just to clarify, I was asking a question.
>>>>>>>>
>>>>>>>> Is it valid to add a new data file with a row?
>>>>>>>>
>>>>>>>>    - whose persisted row-id value is lower than the snapshot's
>>>>>>>>    first-row-id
>>>>>>>>    - whose last-updated-seq-number is not set and inherit from the
>>>>>>>>    snapshot sequence number
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Steven
>>>>>>>>
>>>>>>>> On Fri, Nov 21, 2025 at 11:25 PM Péter Váry <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> +1 for this proposal
>>>>>>>>>
>>>>>>>>> Slightly related, but we can move this to a separate thread if it
>>>>>>>>> needs independent discussion: We should clarify the relationship 
>>>>>>>>> between
>>>>>>>>> `row-id` and `first-row-id`. This has come up several times in our
>>>>>>>>> discussions about the equality delete removal proposal, where we 
>>>>>>>>> considered
>>>>>>>>> generating `row-ids` manually instead of relying on the 
>>>>>>>>> auto-assignment
>>>>>>>>> feature.
>>>>>>>>>
>>>>>>>>> As discussed with Steven:
>>>>>>>>>
>>>>>>>>>> It is valid to add a new data file with a row:
>>>>>>>>>>
>>>>>>>>>>    - whose persisted row-id value is lower than the snapshot's
>>>>>>>>>>    first-row-id
>>>>>>>>>>    - whose last-updated-seq-number is not set and inherit from
>>>>>>>>>>    the snapshot sequence number
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Prashant Singh <[email protected]> ezt írta (időpont:
>>>>>>>>> 2025. nov. 22., Szo, 5:29):
>>>>>>>>>
>>>>>>>>>> +1 for making it explicit that an *undelete *of a row can't be
>>>>>>>>>> done by unsetting the corresponding bit in DV
>>>>>>>>>>
>>>>>>>>>> *Rows should only be added via new data files*, sounds
>>>>>>>>>> reasonable to me !
>>>>>>>>>>
>>>>>>>>>> apart from row-lineage it also complicates the operation type
>>>>>>>>>> inference like here [1] as we would now
>>>>>>>>>> inspect the contents of these DV to see if it's an insert ?
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://github.com/apache/iceberg/pull/14581#discussion_r2533057189
>>>>>>>>>>
>>>>>>>>>> On Sat, Nov 22, 2025 at 4:48 AM Szehon Ho <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> It makes sense to me, it sounds like a minor clarification.  For
>>>>>>>>>>> v2 position deletes, code like rewrite_position_deletes may have 
>>>>>>>>>>> made some
>>>>>>>>>>> assumptions like this and would not work well if violated, maybe 
>>>>>>>>>>> other code
>>>>>>>>>>> as well.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Szehon
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Nov 21, 2025 at 3:03 PM Steven Wu <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Similar weird behavior can also happen for V2 position delete
>>>>>>>>>>>> files with `undelete`.
>>>>>>>>>>>>
>>>>>>>>>>>> In V2, there could be multiple position delete files (say pd1,
>>>>>>>>>>>> pd2) associated with the same data file (say f1). Let's say pd1 
>>>>>>>>>>>> deletes row
>>>>>>>>>>>> 5 and 10 and pd2 deletes row 15.
>>>>>>>>>>>> 1. a new snapshot is committed with pd1 (DELETED), pd2
>>>>>>>>>>>> (EXISTING), and pd3 (ADDED). pd3 deletes only row 10 (undeleted 
>>>>>>>>>>>> row 5)
>>>>>>>>>>>> 2. a new snapshot is committed with pd1 (DELETED) and pd2
>>>>>>>>>>>> (EXISTING)
>>>>>>>>>>>>
>>>>>>>>>>>> In either case, essentially some rows are added (back) to the
>>>>>>>>>>>> table with lower sequence number than the new snapshot's sequence 
>>>>>>>>>>>> number.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Just to recap the question: should the spec (v2 and v3) spell
>>>>>>>>>>>> out that `undelete row` is not allowed? Rows should only be added 
>>>>>>>>>>>> via new
>>>>>>>>>>>> data files.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Nov 21, 2025 at 1:09 PM Steven Wu <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> >Are we specifically stating somewhere that all row-ids should
>>>>>>>>>>>>> be higher than or equal to the snapshot's `first-row-id`?
>>>>>>>>>>>>> In my mental model the `first-row-id` is only applicable for
>>>>>>>>>>>>> rows that don't have a specific row-id assigned.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I meant an ADDED row should have `row-id` higher than or equal
>>>>>>>>>>>>> to the snapshot's `first-row-id`. EXISTING or UPDATED row can 
>>>>>>>>>>>>> have lower
>>>>>>>>>>>>> row id.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Nov 21, 2025 at 1:04 PM Steven Wu <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> > Can we create a validator to prevent this from happening?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We don't have this problem with the Java implementation.
>>>>>>>>>>>>>> `BaseDVFileWriter` merges the  previous DV with the new delta 
>>>>>>>>>>>>>> DV. So there
>>>>>>>>>>>>>> is no `undelete` behavior. I am not aware of any Java API to 
>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>> "undelete". So we probably don't need to add any validation code 
>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>> Java impl.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Just thought it is good to spell it out in the spec so that
>>>>>>>>>>>>>> clients/engines can be clear about the expected behavior.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Nov 21, 2025 at 12:18 PM Péter Váry <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Are we specifically stating somewhere that all row-ids
>>>>>>>>>>>>>>> should be higher than or equal to the snapshot's `first-row-id`?
>>>>>>>>>>>>>>> In my mental model the `first-row-id` is only applicable for
>>>>>>>>>>>>>>> rows that don't have a specific row-id assigned.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Noneless, I agree that the `row-id` and the
>>>>>>>>>>>>>>> `last-updated-seq-num` should have changed to a new one, so we 
>>>>>>>>>>>>>>> can say that
>>>>>>>>>>>>>>> undeleting a row is not allowed because of this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can we create a validator to prevent this from happening?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2025.
>>>>>>>>>>>>>>> nov. 21., P, 21:11):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The undeleted row would have invalid `row-id` and
>>>>>>>>>>>>>>>> `last-updated-seq-num`. Since it is a new row (added back), it 
>>>>>>>>>>>>>>>> should have
>>>>>>>>>>>>>>>> the `row-id` higher than or equal to the snapshot's 
>>>>>>>>>>>>>>>> `first-row-id` and the
>>>>>>>>>>>>>>>> `last-updated-seq-number` should inherit/have the new 
>>>>>>>>>>>>>>>> snapshot's sequence
>>>>>>>>>>>>>>>> number.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Nov 21, 2025 at 11:48 AM Steven Wu <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Should we clarify the V3 spec to explicitly formid "
>>>>>>>>>>>>>>>>> *undelete*" of a row by unsetting the DV bit? Unsetting a
>>>>>>>>>>>>>>>>> DV bit essentially adds a row with lower row-id than the 
>>>>>>>>>>>>>>>>> snapshot's
>>>>>>>>>>>>>>>>> first-row-id, which would violate the row lineage spec. With 
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> restriction, DV cardinality should be monotonically 
>>>>>>>>>>>>>>>>> increasing.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>

Re: [DISCUSS] V3 spec: add monotonic requirement to data DV

Reply via email to