Hi Russell Thanks for the nice writeup and the proposal.
I agree with your analysis, and I have the same feeling. However, I think there are more than Flink that write equality delete files. So, I agree to deprecate in V3, but maybe be more "flexible" about removal in V4 in order to give time to engines to update. I think that by deprecating equality deletes, we are clearly focusing on read performance and "consistency" (more than write). It's not necessarily a bad thing but the streaming platform and data ingestion platforms will be probably concerned about that (by using positional deletes, they will have to scan/read all datafiles to find the position, so painful). So, to summarize: 1. Agree to deprecate equality deletes, but -1 to commit any target for deletion before having a clear path for streaming platforms (Flink, Beam, ...) 2. In the meantime (during the deprecation period), I propose to explore possible improvements for streaming platforms (maybe finding a way to avoid full data files scan, ...) Thanks ! Regards JB On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > > Background: > > 1) Position Deletes > > > Writers determine what rows are deleted and mark them in a 1 for 1 > representation. With delete vectors this means every data file has at most 1 > delete vector that it is read in conjunction with to excise deleted rows. > Reader overhead is more or less constant and is very predictable. > > > The main cost of this mode is that deletes must be determined at write time > which is expensive and can be more difficult for conflict resolution > > 2) Equality Deletes > > Writers write out reference to what values are deleted (in a partition or > globally). There can be an unlimited number of equality deletes and they all > must be checked for every data file that is read. The cost of determining > deleted rows is essentially given to the reader. > > Conflicts almost never happen since data files are not actually changed and > there is almost no cost to the writer to generate these. Almost all costs > related to equality deletes are passed on to the reader. > > Proposal: > > Equality deletes are, in my opinion, unsustainable and we should work on > deprecating and removing them from the specification. At this time, I know of > only one engine (Apache Flink) which produces these deletes but almost all > engines have implementations to read them. The cost of implementing equality > deletes on the read path is difficult and unpredictable in terms of memory > usage and compute complexity. We’ve had suggestions of implementing rocksdb > inorder to handle ever growing sets of equality deletes which in my opinion > shows that we are going down the wrong path. > > Outside of performance, Equality deletes are also difficult to use in > conjunction with many other features. For example, any features requiring CDC > or Row lineage are basically impossible when equality deletes are in use. > When Equality deletes are present, the state of the table can only be > determined with a full scan making it difficult to update differential > structures. This means materialized views or indexes need to essentially be > fully rebuilt whenever an equality delete is added to the table. > > Equality deletes essentially remove complexity from the write side but then > add what I believe is an unacceptable level of complexity to the read side. > > Because of this I suggest we deprecate Equality Deletes in V3 and slate them > for full removal from the Iceberg Spec in V4. > > I know this is a big change and compatibility breakage so I would like to > introduce this idea to the community and solicit feedback from all > stakeholders. I am very flexible on this issue and would like to hear the > best issues both for and against removal of Equality Deletes. > > Thanks everyone for your time, > > Russ Spitzer > >