Hi Russell

Thanks for the nice writeup and the proposal.

I agree with your analysis, and I have the same feeling. However, I
think there are more than Flink that write equality delete files. So,
I agree to deprecate in V3, but maybe be more "flexible" about removal
in V4 in order to give time to engines to update.
I think that by deprecating equality deletes, we are clearly focusing
on read performance and "consistency" (more than write). It's not
necessarily a bad thing but the streaming platform and data ingestion
platforms will be probably concerned about that (by using positional
deletes, they will have to scan/read all datafiles to find the
position, so painful).

So, to summarize:
1. Agree to deprecate equality deletes, but -1 to commit any target
for deletion before having a clear path for streaming platforms
(Flink, Beam, ...)
2. In the meantime (during the deprecation period), I propose to
explore possible improvements for streaming platforms (maybe finding a
way to avoid full data files scan, ...)

Thanks !
Regards
JB

On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer
<russell.spit...@gmail.com> wrote:
>
> Background:
>
> 1) Position Deletes
>
>
> Writers determine what rows are deleted and mark them in a 1 for 1 
> representation. With delete vectors this means every data file has at most 1 
> delete vector that it is read in conjunction with to excise deleted rows. 
> Reader overhead is more or less constant and is very predictable.
>
>
> The main cost of this mode is that deletes must be determined at write time 
> which is expensive and can be more difficult for conflict resolution
>
> 2) Equality Deletes
>
> Writers write out reference to what values are deleted (in a partition or 
> globally). There can be an unlimited number of equality deletes and they all 
> must be checked for every data file that is read. The cost of determining 
> deleted rows is essentially given to the reader.
>
> Conflicts almost never happen since data files are not actually changed and 
> there is almost no cost to the writer to generate these. Almost all costs 
> related to equality deletes are passed on to the reader.
>
> Proposal:
>
> Equality deletes are, in my opinion, unsustainable and we should work on 
> deprecating and removing them from the specification. At this time, I know of 
> only one engine (Apache Flink) which produces these deletes but almost all 
> engines have implementations to read them. The cost of implementing equality 
> deletes on the read path is difficult and unpredictable in terms of memory 
> usage and compute complexity. We’ve had suggestions of implementing rocksdb 
> inorder to handle ever growing sets of equality deletes which in my opinion 
> shows that we are going down the wrong path.
>
> Outside of performance, Equality deletes are also difficult to use in 
> conjunction with many other features. For example, any features requiring CDC 
> or Row lineage are basically impossible when equality deletes are in use. 
> When Equality deletes are present, the state of the table can only be 
> determined with a full scan making it difficult to update differential 
> structures. This means materialized views or indexes need to essentially be 
> fully rebuilt whenever an equality delete is added to the table.
>
> Equality deletes essentially remove complexity from the write side but then 
> add what I believe is an unacceptable level of complexity to the read side.
>
> Because of this I suggest we deprecate Equality Deletes in V3 and slate them 
> for full removal from the Iceberg Spec in V4.
>
> I know this is a big change and compatibility breakage so I would like to 
> introduce this idea to the community and solicit feedback from all 
> stakeholders. I am very flexible on this issue and would like to hear the 
> best issues both for and against removal of Equality Deletes.
>
> Thanks everyone for your time,
>
> Russ Spitzer
>
>

Reply via email to