Re: [DISCUSS] V3 spec: add monotonic requirement to data DV

Szehon Ho Fri, 21 Nov 2025 15:17:29 -0800

It makes sense to me, it sounds like a minor clarification.  For v2
position deletes, code like rewrite_position_deletes may have made some
assumptions like this and would not work well if violated, maybe other code
as well.


Thanks
Szehon

On Fri, Nov 21, 2025 at 3:03 PM Steven Wu <[email protected]> wrote:

> Similar weird behavior can also happen for V2 position delete files with
> `undelete`.
>
> In V2, there could be multiple position delete files (say pd1, pd2)
> associated with the same data file (say f1). Let's say pd1 deletes row 5
> and 10 and pd2 deletes row 15.
> 1. a new snapshot is committed with pd1 (DELETED), pd2 (EXISTING), and pd3
> (ADDED). pd3 deletes only row 10 (undeleted row 5)
> 2. a new snapshot is committed with pd1 (DELETED) and pd2 (EXISTING)
>
> In either case, essentially some rows are added (back) to the table with
> lower sequence number than the new snapshot's sequence number.
>
>
>
> Just to recap the question: should the spec (v2 and v3) spell out that
> `undelete row` is not allowed? Rows should only be added via new data files.
>
>
>
>
> On Fri, Nov 21, 2025 at 1:09 PM Steven Wu <[email protected]> wrote:
>
>> >Are we specifically stating somewhere that all row-ids should be higher
>> than or equal to the snapshot's `first-row-id`?
>> In my mental model the `first-row-id` is only applicable for rows that
>> don't have a specific row-id assigned.
>>
>> I meant an ADDED row should have `row-id` higher than or equal to the
>> snapshot's `first-row-id`. EXISTING or UPDATED row can have lower row id.
>>
>> On Fri, Nov 21, 2025 at 1:04 PM Steven Wu <[email protected]> wrote:
>>
>>> > Can we create a validator to prevent this from happening?
>>>
>>> We don't have this problem with the Java implementation.
>>> `BaseDVFileWriter` merges the  previous DV with the new delta DV. So there
>>> is no `undelete` behavior. I am not aware of any Java API to allow
>>> "undelete". So we probably don't need to add any validation code in the
>>> Java impl.
>>>
>>> Just thought it is good to spell it out in the spec so that
>>> clients/engines can be clear about the expected behavior.
>>>
>>> On Fri, Nov 21, 2025 at 12:18 PM Péter Váry <[email protected]>
>>> wrote:
>>>
>>>> Are we specifically stating somewhere that all row-ids should be higher
>>>> than or equal to the snapshot's `first-row-id`?
>>>> In my mental model the `first-row-id` is only applicable for rows that
>>>> don't have a specific row-id assigned.
>>>>
>>>> Noneless, I agree that the `row-id` and the
>>>> `last-updated-seq-num` should have changed to a new one, so we can say that
>>>> undeleting a row is not allowed because of this.
>>>>
>>>> Can we create a validator to prevent this from happening?
>>>>
>>>>
>>>>
>>>> Steven Wu <[email protected]> ezt írta (időpont: 2025. nov. 21., P,
>>>> 21:11):
>>>>
>>>>> The undeleted row would have invalid `row-id` and
>>>>> `last-updated-seq-num`. Since it is a new row (added back), it should have
>>>>> the `row-id` higher than or equal to the snapshot's `first-row-id` and the
>>>>> `last-updated-seq-number` should inherit/have the new snapshot's sequence
>>>>> number.
>>>>>
>>>>> On Fri, Nov 21, 2025 at 11:48 AM Steven Wu <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Should we clarify the V3 spec to explicitly formid "*undelete*" of a
>>>>>> row by unsetting the DV bit? Unsetting a DV bit essentially adds a row 
>>>>>> with
>>>>>> lower row-id than the snapshot's first-row-id, which would violate the 
>>>>>> row
>>>>>> lineage spec. With the restriction, DV cardinality should be 
>>>>>> monotonically
>>>>>> increasing.
>>>>>>
>>>>>> Thanks,
>>>>>> Steven
>>>>>>
>>>>>

Re: [DISCUSS] V3 spec: add monotonic requirement to data DV

Reply via email to