Well, it seems like I'm a little late, so most of the arguments are voiced.
I agree that we should not deprecate the equality deletes until we have a replacement feature. I think one of the big advantages of Iceberg is that it supports batch processing and streaming ingestion too. For streaming ingestion we need a way to update existing data in a performant way, but restricting deletes for the primary keys seems like enough from the streaming perspective. Equality deletes allow a very wide range of applications, which we might be able to narrow down a bit, but still keep useful. So if we want to go down this road, we need to start collecting the requirements. Thanks, Peter Shani Elharrar <sh...@upsolver.com.invalid> ezt írta (időpont: 2024. nov. 1., P, 19:22): > I understand how it makes sense for batch jobs, but it damages stream > jobs, using equality deletes works much better for streaming (which have a > strict SLA for delays), and in order to decrease the performance penalty - > systems can rewrite the equality deletes to positional deletes. > > Shani. > > On 1 Nov 2024, at 20:06, Steven Wu <stevenz...@gmail.com> wrote: > > > Fundamentally, it is very difficult to write position deletes with > concurrent writers and conflicts for batch jobs too, as the inverted index > may become invalid/stale. > > The position deletes are created during the write phase. But conflicts are > only detected at the commit stage. I assume the batch job should fail in > this case. > > On Fri, Nov 1, 2024 at 10:57 AM Steven Wu <stevenz...@gmail.com> wrote: > >> Shani, >> >> That is a good point. It is certainly a limitation for the Flink job to >> track the inverted index internally (which is what I had in mind). It can't >> be shared/synchronized with other Flink jobs or other engines writing to >> the same table. >> >> Thanks, >> Steven >> >> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar <sh...@upsolver.com.invalid> >> wrote: >> >>> Even if Flink can create this state, it would have to be maintained >>> against the Iceberg table, we wouldn't like duplicates (keys) if other >>> systems / users update the table (e.g manual insert / updates using DML). >>> >>> Shani. >>> >>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com> wrote: >>> >>> >>> > Add support for inverted indexes to reduce the cost of position >>> lookup. This is fairly tricky to implement for streaming use cases without >>> an external system. >>> >>> Anton, that is also what I was saying earlier. In Flink, the inverted >>> index of (key, committed data files) can be tracked in Flink state. >>> >>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi <aokolnyc...@gmail.com> >>> wrote: >>> >>>> I was a bit skeptical when we were adding equality deletes, but nothing >>>> beats their performance during writes. We have to find an alternative >>>> before deprecating. >>>> >>>> We are doing a lot of work to improve streaming, like reducing the cost >>>> of commits, enabling a large (potentially infinite) number of snapshots, >>>> changelog reads, and so on. It is a project goal to excel in streaming. >>>> >>>> I was going to focus on equality deletes after completing the DV work. >>>> I believe we have these options: >>>> >>>> - Revisit the existing design of equality deletes (e.g. add more >>>> restrictions, improve compaction, offer new writers). >>>> - Standardize on the view-based approach [1] to handle streaming >>>> upserts and CDC use cases, potentially making this part of the spec. >>>> - Add support for inverted indexes to reduce the cost of position >>>> lookup. This is fairly tricky to implement for streaming use cases without >>>> an external system. Our runtime filtering in Spark today is equivalent to >>>> looking up positions in an inverted index represented by another Iceberg >>>> table. That may still not be enough for some streaming use cases. >>>> >>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/ >>>> >>>> - Anton >>>> >>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield <emkornfi...@gmail.com> >>>> пише: >>>> >>>>> I agree that equality deletes have their place in streaming. I think >>>>> the ultimate decision here is how opinionated Iceberg wants to be on its >>>>> use-cases. If it really wants to stick to its origins of "slow moving >>>>> data", then removing equality deletes would be inline with this. I think >>>>> the other high level question is how much we allow for partially >>>>> compatible >>>>> features (the row lineage use-case feature was explicitly approved >>>>> excluding equality deletes, and people seemed OK with it at the time. If >>>>> all features need to work together, then maybe we need to rethink the >>>>> design here so it can be forward compatible with equality deletes). >>>>> >>>>> I think one issue with equality deletes as stated in the spec is that >>>>> they are overly broad. I'd be interested if people have any use cases >>>>> that >>>>> differ, but I think one way of narrowing (and probably a necessary >>>>> building >>>>> block for building something better) the specification scope on equality >>>>> deletes is to focus on upsert/Streaming deletes. Two proposals in this >>>>> regard are: >>>>> >>>>> 1. Require that equality deletes can only correspond to unique >>>>> identifiers for the table. >>>>> 2. Consider requiring that for equality deletes on partitioned >>>>> tables, that the primary key must contain a partition column (I believe >>>>> Flink at least already does this). It is less clear to me that this would >>>>> meet all existing use-cases. But having this would allow for better >>>>> incremental data-structures, which could then be partition based. >>>>> >>>>> Narrow scope to unique identifiers would allow for further building >>>>> blocks already mentioned, like a secondary index (possible via LSM tree), >>>>> that would allow for better performance overall. >>>>> >>>>> I generally agree with the sentiment that we shouldn't deprecate them >>>>> until there is a viable replacement. With all due respect to my employer, >>>>> let's not fall into the Google trap [1] :) >>>>> >>>>> Cheers, >>>>> Micah >>>>> >>>>> [1] https://goomics.net/50/ >>>>> >>>>> >>>>> >>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo < >>>>> alex...@starburstdata.com> wrote: >>>>> >>>>>> Hey all, >>>>>> >>>>>> Just to throw my 2 cents in, I agree with Steven and others that we >>>>>> do need some kind of replacement before deprecating equality deletes. >>>>>> They certainly have their problems, and do significantly increase >>>>>> complexity as they are now, but the writing of position deletes is too >>>>>> expensive for certain pipelines. >>>>>> >>>>>> We've been investigating using equality deletes for some of our >>>>>> workloads at Starburst, the key advantage we were hoping to leverage is >>>>>> cheap, effectively random access lookup deletes. >>>>>> Say you have a UUID column that's unique in a table and want to >>>>>> delete a row by UUID. With position deletes each delete is expensive >>>>>> without an index on that UUID. >>>>>> With equality deletes each delete is cheap and while reads/compaction >>>>>> is expensive but when updates are frequent and reads are sporadic that's >>>>>> a >>>>>> reasonable tradeoff. >>>>>> >>>>>> Pretty much what Jason and Steven have already said. >>>>>> >>>>>> Maybe there are some incremental improvements on equality deletes or >>>>>> tips from similar systems that might alleviate some of their problems? >>>>>> >>>>>> - Alex Jo >>>>>> >>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu <stevenz...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> We probably all agree with the downside of equality deletes: it >>>>>>> postpones all the work on the read path. >>>>>>> >>>>>>> In theory, we can implement position deletes only in the Flink >>>>>>> streaming writer. It would require the tracking of last committed data >>>>>>> files per key, which can be stored in Flink state (checkpointed). This >>>>>>> is >>>>>>> obviously quite expensive/challenging, but possible. >>>>>>> >>>>>>> I like to echo one benefit of equality deletes that Russel called >>>>>>> out in the original email. Equality deletes would never have conflicts. >>>>>>> that is important for streaming writers (Flink, Kafka connect, ...) that >>>>>>> commit frequently (minutes or less). Assume Flink can write position >>>>>>> deletes only and commit every 2 minutes. The long-running nature of >>>>>>> streaming jobs can cause frequent commit conflicts with background >>>>>>> delete >>>>>>> compaction jobs. >>>>>>> >>>>>>> Overall, the streaming upsert write is not a well solved problem in >>>>>>> Iceberg. This probably affects all streaming engines (Flink, Kafka >>>>>>> connect, >>>>>>> Spark streaming, ...). We need to come up with some better alternatives >>>>>>> before we can deprecate equality deletes. >>>>>>> >>>>>>> >>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer < >>>>>>> russell.spit...@gmail.com> wrote: >>>>>>> >>>>>>>> For users of Equality Deletes, what are the key benefits to >>>>>>>> Equality Deletes that you would like to preserve and could you please >>>>>>>> share >>>>>>>> some concrete examples of the queries you want to run (and the schemas >>>>>>>> and >>>>>>>> data sizes you would like to run them against) and the latencies that >>>>>>>> would >>>>>>>> be acceptable? >>>>>>>> >>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine >>>>>>>> <ja...@upsolver.com.invalid> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Representing Upsolver here, we also make use of Equality Deletes >>>>>>>>> to deliver high frequency low latency updates to our clients at >>>>>>>>> scale. We >>>>>>>>> have customers using them at scale and demonstrating the need and >>>>>>>>> viability. We automate the process of converting them into positional >>>>>>>>> deletes (or fully applying them) for more efficient engine queries in >>>>>>>>> the >>>>>>>>> background giving our users both low latency and good query >>>>>>>>> performance. >>>>>>>>> >>>>>>>>> Equality Deletes were added since there isn't a good way to solve >>>>>>>>> frequent updates otherwise. It would require some sort of index >>>>>>>>> keeping >>>>>>>>> track of every record in the table (by a predetermined PK) and >>>>>>>>> maintaining >>>>>>>>> such an index is a huge task that every tool interested in this would >>>>>>>>> need >>>>>>>>> to re-implement. It also becomes a bottleneck limiting table sizes. >>>>>>>>> >>>>>>>>> I don't think they should be removed without providing an >>>>>>>>> alternative. Positional Deletes have a different performance profile >>>>>>>>> inherently, requiring more upfront work proportional to the table >>>>>>>>> size. >>>>>>>>> >>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré < >>>>>>>>> j...@nanthrax.net> wrote: >>>>>>>>> >>>>>>>>>> Hi Russell >>>>>>>>>> >>>>>>>>>> Thanks for the nice writeup and the proposal. >>>>>>>>>> >>>>>>>>>> I agree with your analysis, and I have the same feeling. However, >>>>>>>>>> I >>>>>>>>>> think there are more than Flink that write equality delete files. >>>>>>>>>> So, >>>>>>>>>> I agree to deprecate in V3, but maybe be more "flexible" about >>>>>>>>>> removal >>>>>>>>>> in V4 in order to give time to engines to update. >>>>>>>>>> I think that by deprecating equality deletes, we are clearly >>>>>>>>>> focusing >>>>>>>>>> on read performance and "consistency" (more than write). It's not >>>>>>>>>> necessarily a bad thing but the streaming platform and data >>>>>>>>>> ingestion >>>>>>>>>> platforms will be probably concerned about that (by using >>>>>>>>>> positional >>>>>>>>>> deletes, they will have to scan/read all datafiles to find the >>>>>>>>>> position, so painful). >>>>>>>>>> >>>>>>>>>> So, to summarize: >>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to commit any >>>>>>>>>> target >>>>>>>>>> for deletion before having a clear path for streaming platforms >>>>>>>>>> (Flink, Beam, ...) >>>>>>>>>> 2. In the meantime (during the deprecation period), I propose to >>>>>>>>>> explore possible improvements for streaming platforms (maybe >>>>>>>>>> finding a >>>>>>>>>> way to avoid full data files scan, ...) >>>>>>>>>> >>>>>>>>>> Thanks ! >>>>>>>>>> Regards >>>>>>>>>> JB >>>>>>>>>> >>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer >>>>>>>>>> <russell.spit...@gmail.com> wrote: >>>>>>>>>> > >>>>>>>>>> > Background: >>>>>>>>>> > >>>>>>>>>> > 1) Position Deletes >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > Writers determine what rows are deleted and mark them in a 1 >>>>>>>>>> for 1 representation. With delete vectors this means every data file >>>>>>>>>> has at >>>>>>>>>> most 1 delete vector that it is read in conjunction with to excise >>>>>>>>>> deleted >>>>>>>>>> rows. Reader overhead is more or less constant and is very >>>>>>>>>> predictable. >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > The main cost of this mode is that deletes must be determined >>>>>>>>>> at write time which is expensive and can be more difficult for >>>>>>>>>> conflict >>>>>>>>>> resolution >>>>>>>>>> > >>>>>>>>>> > 2) Equality Deletes >>>>>>>>>> > >>>>>>>>>> > Writers write out reference to what values are deleted (in a >>>>>>>>>> partition or globally). There can be an unlimited number of equality >>>>>>>>>> deletes and they all must be checked for every data file that is >>>>>>>>>> read. The >>>>>>>>>> cost of determining deleted rows is essentially given to the reader. >>>>>>>>>> > >>>>>>>>>> > Conflicts almost never happen since data files are not actually >>>>>>>>>> changed and there is almost no cost to the writer to generate these. >>>>>>>>>> Almost >>>>>>>>>> all costs related to equality deletes are passed on to the reader. >>>>>>>>>> > >>>>>>>>>> > Proposal: >>>>>>>>>> > >>>>>>>>>> > Equality deletes are, in my opinion, unsustainable and we >>>>>>>>>> should work on deprecating and removing them from the specification. >>>>>>>>>> At >>>>>>>>>> this time, I know of only one engine (Apache Flink) which produces >>>>>>>>>> these >>>>>>>>>> deletes but almost all engines have implementations to read them. >>>>>>>>>> The cost >>>>>>>>>> of implementing equality deletes on the read path is difficult and >>>>>>>>>> unpredictable in terms of memory usage and compute complexity. We’ve >>>>>>>>>> had >>>>>>>>>> suggestions of implementing rocksdb inorder to handle ever growing >>>>>>>>>> sets of >>>>>>>>>> equality deletes which in my opinion shows that we are going down >>>>>>>>>> the wrong >>>>>>>>>> path. >>>>>>>>>> > >>>>>>>>>> > Outside of performance, Equality deletes are also difficult to >>>>>>>>>> use in conjunction with many other features. For example, any >>>>>>>>>> features >>>>>>>>>> requiring CDC or Row lineage are basically impossible when equality >>>>>>>>>> deletes >>>>>>>>>> are in use. When Equality deletes are present, the state of the >>>>>>>>>> table can >>>>>>>>>> only be determined with a full scan making it difficult to update >>>>>>>>>> differential structures. This means materialized views or indexes >>>>>>>>>> need to >>>>>>>>>> essentially be fully rebuilt whenever an equality delete is added to >>>>>>>>>> the >>>>>>>>>> table. >>>>>>>>>> > >>>>>>>>>> > Equality deletes essentially remove complexity from the write >>>>>>>>>> side but then add what I believe is an unacceptable level of >>>>>>>>>> complexity to >>>>>>>>>> the read side. >>>>>>>>>> > >>>>>>>>>> > Because of this I suggest we deprecate Equality Deletes in V3 >>>>>>>>>> and slate them for full removal from the Iceberg Spec in V4. >>>>>>>>>> > >>>>>>>>>> > I know this is a big change and compatibility breakage so I >>>>>>>>>> would like to introduce this idea to the community and solicit >>>>>>>>>> feedback >>>>>>>>>> from all stakeholders. I am very flexible on this issue and would >>>>>>>>>> like to >>>>>>>>>> hear the best issues both for and against removal of Equality >>>>>>>>>> Deletes. >>>>>>>>>> > >>>>>>>>>> > Thanks everyone for your time, >>>>>>>>>> > >>>>>>>>>> > Russ Spitzer >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> *Jason Fine* >>>>>>>>> Chief Software Architect >>>>>>>>> ja...@upsolver.com | www.upsolver.com >>>>>>>>> >>>>>>>>