It makes sense to me, it sounds like a minor clarification. For v2 position deletes, code like rewrite_position_deletes may have made some assumptions like this and would not work well if violated, maybe other code as well.
Thanks Szehon On Fri, Nov 21, 2025 at 3:03 PM Steven Wu <[email protected]> wrote: > Similar weird behavior can also happen for V2 position delete files with > `undelete`. > > In V2, there could be multiple position delete files (say pd1, pd2) > associated with the same data file (say f1). Let's say pd1 deletes row 5 > and 10 and pd2 deletes row 15. > 1. a new snapshot is committed with pd1 (DELETED), pd2 (EXISTING), and pd3 > (ADDED). pd3 deletes only row 10 (undeleted row 5) > 2. a new snapshot is committed with pd1 (DELETED) and pd2 (EXISTING) > > In either case, essentially some rows are added (back) to the table with > lower sequence number than the new snapshot's sequence number. > > > > Just to recap the question: should the spec (v2 and v3) spell out that > `undelete row` is not allowed? Rows should only be added via new data files. > > > > > On Fri, Nov 21, 2025 at 1:09 PM Steven Wu <[email protected]> wrote: > >> >Are we specifically stating somewhere that all row-ids should be higher >> than or equal to the snapshot's `first-row-id`? >> In my mental model the `first-row-id` is only applicable for rows that >> don't have a specific row-id assigned. >> >> I meant an ADDED row should have `row-id` higher than or equal to the >> snapshot's `first-row-id`. EXISTING or UPDATED row can have lower row id. >> >> On Fri, Nov 21, 2025 at 1:04 PM Steven Wu <[email protected]> wrote: >> >>> > Can we create a validator to prevent this from happening? >>> >>> We don't have this problem with the Java implementation. >>> `BaseDVFileWriter` merges the previous DV with the new delta DV. So there >>> is no `undelete` behavior. I am not aware of any Java API to allow >>> "undelete". So we probably don't need to add any validation code in the >>> Java impl. >>> >>> Just thought it is good to spell it out in the spec so that >>> clients/engines can be clear about the expected behavior. >>> >>> On Fri, Nov 21, 2025 at 12:18 PM Péter Váry <[email protected]> >>> wrote: >>> >>>> Are we specifically stating somewhere that all row-ids should be higher >>>> than or equal to the snapshot's `first-row-id`? >>>> In my mental model the `first-row-id` is only applicable for rows that >>>> don't have a specific row-id assigned. >>>> >>>> Noneless, I agree that the `row-id` and the >>>> `last-updated-seq-num` should have changed to a new one, so we can say that >>>> undeleting a row is not allowed because of this. >>>> >>>> Can we create a validator to prevent this from happening? >>>> >>>> >>>> >>>> Steven Wu <[email protected]> ezt írta (időpont: 2025. nov. 21., P, >>>> 21:11): >>>> >>>>> The undeleted row would have invalid `row-id` and >>>>> `last-updated-seq-num`. Since it is a new row (added back), it should have >>>>> the `row-id` higher than or equal to the snapshot's `first-row-id` and the >>>>> `last-updated-seq-number` should inherit/have the new snapshot's sequence >>>>> number. >>>>> >>>>> On Fri, Nov 21, 2025 at 11:48 AM Steven Wu <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Should we clarify the V3 spec to explicitly formid "*undelete*" of a >>>>>> row by unsetting the DV bit? Unsetting a DV bit essentially adds a row >>>>>> with >>>>>> lower row-id than the snapshot's first-row-id, which would violate the >>>>>> row >>>>>> lineage spec. With the restriction, DV cardinality should be >>>>>> monotonically >>>>>> increasing. >>>>>> >>>>>> Thanks, >>>>>> Steven >>>>>> >>>>>
