Equality deletes aren't only written from Flink; Iceberg Kafka Connect
(Tabular’s version) also writes equality deletes for upserts.

Writers write out reference to what values are deleted (in a partition or
> globally). There can be an unlimited number of equality deletes and they
> all must be checked for every data file that is read. The cost of
> determining deleted rows is essentially given to the reader


Should we focus on optimizing these by compacting them into a single file
to reduce read overhead?
What are the plans for supporting streaming writes in Iceberg if we move
away from equality deletes? Can we achieve real-time writing with position
deletes instead, or would this impact write performance?

- Ajantha

On Thu, Oct 31, 2024 at 7:18 AM Gang Wu <ust...@gmail.com> wrote:

> Thanks Russell for bringing this up!
>
> +1 on deprecating equality deletes.
>
> IMHO, this is something that should reside only in the ingestion engine.
>
> Best,
> Gang
>
> On Thu, Oct 31, 2024 at 5:07 AM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> Background:
>>
>> 1) Position Deletes
>>
>>
>> Writers determine what rows are deleted and mark them in a 1 for 1
>> representation. With delete vectors this means every data file has at most
>> 1 delete vector that it is read in conjunction with to excise deleted rows.
>> Reader overhead is more or less constant and is very predictable.
>>
>>
>> The main cost of this mode is that deletes must be determined at write
>> time which is expensive and can be more difficult for conflict resolution
>>
>> 2) Equality Deletes
>>
>> Writers write out reference to what values are deleted (in a partition or
>> globally). There can be an unlimited number of equality deletes and they
>> all must be checked for every data file that is read. The cost of
>> determining deleted rows is essentially given to the reader.
>>
>> Conflicts almost never happen since data files are not actually changed
>> and there is almost no cost to the writer to generate these. Almost all
>> costs related to equality deletes are passed on to the reader.
>>
>> Proposal:
>>
>> Equality deletes are, in my opinion, unsustainable and we should work on
>> deprecating and removing them from the specification. At this time, I know
>> of only one engine (Apache Flink) which produces these deletes but almost
>> all engines have implementations to read them. The cost of implementing
>> equality deletes on the read path is difficult and unpredictable in terms
>> of memory usage and compute complexity. We’ve had suggestions of
>> implementing rocksdb inorder to handle ever growing sets of equality
>> deletes which in my opinion shows that we are going down the wrong path.
>>
>> Outside of performance, Equality deletes are also difficult to use in
>> conjunction with many other features. For example, any features requiring
>> CDC or Row lineage are basically impossible when equality deletes are in
>> use. When Equality deletes are present, the state of the table can only be
>> determined with a full scan making it difficult to update differential
>> structures. This means materialized views or indexes need to essentially be
>> fully rebuilt whenever an equality delete is added to the table.
>>
>> Equality deletes essentially remove complexity from the write side but
>> then add what I believe is an unacceptable level of complexity to the read
>> side.
>>
>> Because of this I suggest we deprecate Equality Deletes in V3 and slate
>> them for full removal from the Iceberg Spec in V4.
>>
>> I know this is a big change and compatibility breakage so I would like to
>> introduce this idea to the community and solicit feedback from all
>> stakeholders. I am very flexible on this issue and would like to hear the
>> best issues both for and against removal of Equality Deletes.
>>
>> Thanks everyone for your time,
>>
>> Russ Spitzer
>>
>>

Reply via email to