I am fine with the proposed spec change. While it "supports/allows"
equality deletes, row lineage semantics needn't/can't be maintained
properly for equality deletes (compared to position deletes). Gang pointed
out a couple issues with the implications. But we have no choice but to
live with those implications due to how equality deletes behave.

Gang, rewriting equality deletes to position deletes doesn't really help in
this case. To have correct lineage, the row update is supposed to have the
row_id carried over from the previous row (equality deleted row) during the
write phase with equality deletes. Instead, this spec change now says the
updated row is a complete new row with new row_id.

On Tue, Feb 11, 2025 at 7:39 PM Gang Wu <ust...@gmail.com> wrote:

> Hi Russell,
>
> Thanks for supporting equality deletes to row lineage!
>
> > accept that "updates" will be treated as "delete" and "insert"
>
> I would say that it has obvious drawbacks below (though it is better than
> not supported):
> 1) updates will be populated differently when outputting changelogs to
> users or downstream databases
> 2) lead to more computation for incremental processing like refreshing
> materialized views
>
> At the same time, I would like to ask if it would help if we support
> rewriting equality deletes to position deletes.
> There was an effort but it has been closed:
> https://github.com/apache/iceberg/pull/2216
>
> Best,
> Gang
>
>
> On Wed, Feb 12, 2025 at 7:25 AM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> Hi Y'all,
>>
>> As we have been working on the row lineage implementation I've been
>> reached out to by a few folks
>>  in the community who are interested in changing our defined behavior
>> around equality deletes.
>>
>> Currently when Row Lineage is enabled, the spec says to disable equality
>> deletes for the table.
>>
>> In the interest of compatibility with Flink and other Equality delete
>> producers, I originally wrote
>> that we would simply treat all equality delete based updates as a pure
>> insert and
>> delete. At the time, some folks thought this was too open and worried
>> that it would be poor behavior which
>> led to the current restriction.
>>
>> Now that we are actually implementing I think there have been some
>> changes of heart and that we
>> should go back to the original design.  I'd like to see if we have
>> consensus
>> in the community to change the wording back and allow equality deletes.
>>
>> PR: https://github.com/apache/iceberg/pull/12230
>>
>> The TLDR;
>>
>> Allow equality deletes with row lineage but accept that "updates" will be
>> treated as "delete" and "insert"
>>
>> Thanks for your time,
>> Russ
>>
>

Reply via email to