Equality deletes aren't only written from Flink; Iceberg Kafka Connect (Tabular’s version) also writes equality deletes for upserts.
Writers write out reference to what values are deleted (in a partition or > globally). There can be an unlimited number of equality deletes and they > all must be checked for every data file that is read. The cost of > determining deleted rows is essentially given to the reader Should we focus on optimizing these by compacting them into a single file to reduce read overhead? What are the plans for supporting streaming writes in Iceberg if we move away from equality deletes? Can we achieve real-time writing with position deletes instead, or would this impact write performance? - Ajantha On Thu, Oct 31, 2024 at 7:18 AM Gang Wu <ust...@gmail.com> wrote: > Thanks Russell for bringing this up! > > +1 on deprecating equality deletes. > > IMHO, this is something that should reside only in the ingestion engine. > > Best, > Gang > > On Thu, Oct 31, 2024 at 5:07 AM Russell Spitzer <russell.spit...@gmail.com> > wrote: > >> Background: >> >> 1) Position Deletes >> >> >> Writers determine what rows are deleted and mark them in a 1 for 1 >> representation. With delete vectors this means every data file has at most >> 1 delete vector that it is read in conjunction with to excise deleted rows. >> Reader overhead is more or less constant and is very predictable. >> >> >> The main cost of this mode is that deletes must be determined at write >> time which is expensive and can be more difficult for conflict resolution >> >> 2) Equality Deletes >> >> Writers write out reference to what values are deleted (in a partition or >> globally). There can be an unlimited number of equality deletes and they >> all must be checked for every data file that is read. The cost of >> determining deleted rows is essentially given to the reader. >> >> Conflicts almost never happen since data files are not actually changed >> and there is almost no cost to the writer to generate these. Almost all >> costs related to equality deletes are passed on to the reader. >> >> Proposal: >> >> Equality deletes are, in my opinion, unsustainable and we should work on >> deprecating and removing them from the specification. At this time, I know >> of only one engine (Apache Flink) which produces these deletes but almost >> all engines have implementations to read them. The cost of implementing >> equality deletes on the read path is difficult and unpredictable in terms >> of memory usage and compute complexity. We’ve had suggestions of >> implementing rocksdb inorder to handle ever growing sets of equality >> deletes which in my opinion shows that we are going down the wrong path. >> >> Outside of performance, Equality deletes are also difficult to use in >> conjunction with many other features. For example, any features requiring >> CDC or Row lineage are basically impossible when equality deletes are in >> use. When Equality deletes are present, the state of the table can only be >> determined with a full scan making it difficult to update differential >> structures. This means materialized views or indexes need to essentially be >> fully rebuilt whenever an equality delete is added to the table. >> >> Equality deletes essentially remove complexity from the write side but >> then add what I believe is an unacceptable level of complexity to the read >> side. >> >> Because of this I suggest we deprecate Equality Deletes in V3 and slate >> them for full removal from the Iceberg Spec in V4. >> >> I know this is a big change and compatibility breakage so I would like to >> introduce this idea to the community and solicit feedback from all >> stakeholders. I am very flexible on this issue and would like to hear the >> best issues both for and against removal of Equality Deletes. >> >> Thanks everyone for your time, >> >> Russ Spitzer >> >>