I also feel we should keep equality deletes until we have an alternative solution for streaming updates/deletes.
-Bryan > On Nov 4, 2024, at 8:33 AM, Péter Váry <peter.vary.apa...@gmail.com> wrote: > > Well, it seems like I'm a little late, so most of the arguments are voiced. > > I agree that we should not deprecate the equality deletes until we have a > replacement feature. > I think one of the big advantages of Iceberg is that it supports batch > processing and streaming ingestion too. > For streaming ingestion we need a way to update existing data in a performant > way, but restricting deletes for the primary keys seems like enough from the > streaming perspective. > > Equality deletes allow a very wide range of applications, which we might be > able to narrow down a bit, but still keep useful. So if we want to go down > this road, we need to start collecting the requirements. > > Thanks, > Peter > > Shani Elharrar <sh...@upsolver.com.invalid> ezt írta (időpont: 2024. nov. 1., > P, 19:22): >> I understand how it makes sense for batch jobs, but it damages stream jobs, >> using equality deletes works much better for streaming (which have a strict >> SLA for delays), and in order to decrease the performance penalty - systems >> can rewrite the equality deletes to positional deletes. >> >> Shani. >> >>> On 1 Nov 2024, at 20:06, Steven Wu <stevenz...@gmail.com >>> <mailto:stevenz...@gmail.com>> wrote: >>> >>> >>> Fundamentally, it is very difficult to write position deletes with >>> concurrent writers and conflicts for batch jobs too, as the inverted index >>> may become invalid/stale. >>> >>> The position deletes are created during the write phase. But conflicts are >>> only detected at the commit stage. I assume the batch job should fail in >>> this case. >>> >>> On Fri, Nov 1, 2024 at 10:57 AM Steven Wu <stevenz...@gmail.com >>> <mailto:stevenz...@gmail.com>> wrote: >>>> Shani, >>>> >>>> That is a good point. It is certainly a limitation for the Flink job to >>>> track the inverted index internally (which is what I had in mind). It >>>> can't be shared/synchronized with other Flink jobs or other engines >>>> writing to the same table. >>>> >>>> Thanks, >>>> Steven >>>> >>>> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar >>>> <sh...@upsolver.com.invalid> wrote: >>>>> Even if Flink can create this state, it would have to be maintained >>>>> against the Iceberg table, we wouldn't like duplicates (keys) if other >>>>> systems / users update the table (e.g manual insert / updates using DML). >>>>> >>>>> Shani. >>>>> >>>>>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com >>>>>> <mailto:stevenz...@gmail.com>> wrote: >>>>>> >>>>>> >>>>>> > Add support for inverted indexes to reduce the cost of position >>>>>> > lookup. This is fairly tricky to implement for streaming use cases >>>>>> > without an external system. >>>>>> >>>>>> Anton, that is also what I was saying earlier. In Flink, the inverted >>>>>> index of (key, committed data files) can be tracked in Flink state. >>>>>> >>>>>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi <aokolnyc...@gmail.com >>>>>> <mailto:aokolnyc...@gmail.com>> wrote: >>>>>>> I was a bit skeptical when we were adding equality deletes, but nothing >>>>>>> beats their performance during writes. We have to find an alternative >>>>>>> before deprecating. >>>>>>> >>>>>>> We are doing a lot of work to improve streaming, like reducing the cost >>>>>>> of commits, enabling a large (potentially infinite) number of >>>>>>> snapshots, changelog reads, and so on. It is a project goal to excel in >>>>>>> streaming. >>>>>>> >>>>>>> I was going to focus on equality deletes after completing the DV work. >>>>>>> I believe we have these options: >>>>>>> >>>>>>> - Revisit the existing design of equality deletes (e.g. add more >>>>>>> restrictions, improve compaction, offer new writers). >>>>>>> - Standardize on the view-based approach [1] to handle streaming >>>>>>> upserts and CDC use cases, potentially making this part of the spec. >>>>>>> - Add support for inverted indexes to reduce the cost of position >>>>>>> lookup. This is fairly tricky to implement for streaming use cases >>>>>>> without an external system. Our runtime filtering in Spark today is >>>>>>> equivalent to looking up positions in an inverted index represented by >>>>>>> another Iceberg table. That may still not be enough for some streaming >>>>>>> use cases. >>>>>>> >>>>>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/ >>>>>>> >>>>>>> - Anton >>>>>>> >>>>>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield <emkornfi...@gmail.com >>>>>>> <mailto:emkornfi...@gmail.com>> пише: >>>>>>>> I agree that equality deletes have their place in streaming. I think >>>>>>>> the ultimate decision here is how opinionated Iceberg wants to be on >>>>>>>> its use-cases. If it really wants to stick to its origins of "slow >>>>>>>> moving data", then removing equality deletes would be inline with >>>>>>>> this. I think the other high level question is how much we allow for >>>>>>>> partially compatible features (the row lineage use-case feature was >>>>>>>> explicitly approved excluding equality deletes, and people seemed OK >>>>>>>> with it at the time. If all features need to work together, then >>>>>>>> maybe we need to rethink the design here so it can be forward >>>>>>>> compatible with equality deletes). >>>>>>>> >>>>>>>> I think one issue with equality deletes as stated in the spec is that >>>>>>>> they are overly broad. I'd be interested if people have any use cases >>>>>>>> that differ, but I think one way of narrowing (and probably a >>>>>>>> necessary building block for building something better) the >>>>>>>> specification scope on equality deletes is to focus on >>>>>>>> upsert/Streaming deletes. Two proposals in this regard are: >>>>>>>> >>>>>>>> 1. Require that equality deletes can only correspond to unique >>>>>>>> identifiers for the table. >>>>>>>> 2. Consider requiring that for equality deletes on partitioned >>>>>>>> tables, that the primary key must contain a partition column (I >>>>>>>> believe Flink at least already does this). It is less clear to me >>>>>>>> that this would meet all existing use-cases. But having this would >>>>>>>> allow for better incremental data-structures, which could then be >>>>>>>> partition based. >>>>>>>> >>>>>>>> Narrow scope to unique identifiers would allow for further building >>>>>>>> blocks already mentioned, like a secondary index (possible via LSM >>>>>>>> tree), that would allow for better performance overall. >>>>>>>> >>>>>>>> I generally agree with the sentiment that we shouldn't deprecate them >>>>>>>> until there is a viable replacement. With all due respect to my >>>>>>>> employer, let's not fall into the Google trap [1] :) >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Micah >>>>>>>> >>>>>>>> [1] https://goomics.net/50/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo >>>>>>>> <alex...@starburstdata.com <mailto:alex...@starburstdata.com>> wrote: >>>>>>>>> Hey all, >>>>>>>>> >>>>>>>>> Just to throw my 2 cents in, I agree with Steven and others that we >>>>>>>>> do need some kind of replacement before deprecating equality deletes. >>>>>>>>> They certainly have their problems, and do significantly increase >>>>>>>>> complexity as they are now, but the writing of position deletes is >>>>>>>>> too expensive for certain pipelines. >>>>>>>>> >>>>>>>>> We've been investigating using equality deletes for some of our >>>>>>>>> workloads at Starburst, the key advantage we were hoping to leverage >>>>>>>>> is cheap, effectively random access lookup deletes. >>>>>>>>> Say you have a UUID column that's unique in a table and want to >>>>>>>>> delete a row by UUID. With position deletes each delete is expensive >>>>>>>>> without an index on that UUID. >>>>>>>>> With equality deletes each delete is cheap and while reads/compaction >>>>>>>>> is expensive but when updates are frequent and reads are sporadic >>>>>>>>> that's a reasonable tradeoff. >>>>>>>>> >>>>>>>>> Pretty much what Jason and Steven have already said. >>>>>>>>> >>>>>>>>> Maybe there are some incremental improvements on equality deletes or >>>>>>>>> tips from similar systems that might alleviate some of their problems? >>>>>>>>> >>>>>>>>> - Alex Jo >>>>>>>>> >>>>>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu <stevenz...@gmail.com >>>>>>>>> <mailto:stevenz...@gmail.com>> wrote: >>>>>>>>>> We probably all agree with the downside of equality deletes: it >>>>>>>>>> postpones all the work on the read path. >>>>>>>>>> >>>>>>>>>> In theory, we can implement position deletes only in the Flink >>>>>>>>>> streaming writer. It would require the tracking of last committed >>>>>>>>>> data files per key, which can be stored in Flink state >>>>>>>>>> (checkpointed). This is obviously quite expensive/challenging, but >>>>>>>>>> possible. >>>>>>>>>> >>>>>>>>>> I like to echo one benefit of equality deletes that Russel called >>>>>>>>>> out in the original email. Equality deletes would never have >>>>>>>>>> conflicts. that is important for streaming writers (Flink, Kafka >>>>>>>>>> connect, ...) that commit frequently (minutes or less). Assume Flink >>>>>>>>>> can write position deletes only and commit every 2 minutes. The >>>>>>>>>> long-running nature of streaming jobs can cause frequent commit >>>>>>>>>> conflicts with background delete compaction jobs. >>>>>>>>>> >>>>>>>>>> Overall, the streaming upsert write is not a well solved problem in >>>>>>>>>> Iceberg. This probably affects all streaming engines (Flink, Kafka >>>>>>>>>> connect, Spark streaming, ...). We need to come up with some better >>>>>>>>>> alternatives before we can deprecate equality deletes. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer >>>>>>>>>> <russell.spit...@gmail.com <mailto:russell.spit...@gmail.com>> wrote: >>>>>>>>>>> For users of Equality Deletes, what are the key benefits to >>>>>>>>>>> Equality Deletes that you would like to preserve and could you >>>>>>>>>>> please share some concrete examples of the queries you want to run >>>>>>>>>>> (and the schemas and data sizes you would like to run them against) >>>>>>>>>>> and the latencies that would be acceptable? >>>>>>>>>>> >>>>>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine >>>>>>>>>>> <ja...@upsolver.com.invalid> wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Representing Upsolver here, we also make use of Equality Deletes >>>>>>>>>>>> to deliver high frequency low latency updates to our clients at >>>>>>>>>>>> scale. We have customers using them at scale and demonstrating the >>>>>>>>>>>> need and viability. We automate the process of converting them >>>>>>>>>>>> into positional deletes (or fully applying them) for more >>>>>>>>>>>> efficient engine queries in the background giving our users both >>>>>>>>>>>> low latency and good query performance. >>>>>>>>>>>> >>>>>>>>>>>> Equality Deletes were added since there isn't a good way to solve >>>>>>>>>>>> frequent updates otherwise. It would require some sort of index >>>>>>>>>>>> keeping track of every record in the table (by a predetermined PK) >>>>>>>>>>>> and maintaining such an index is a huge task that every tool >>>>>>>>>>>> interested in this would need to re-implement. It also becomes a >>>>>>>>>>>> bottleneck limiting table sizes. >>>>>>>>>>>> >>>>>>>>>>>> I don't think they should be removed without providing an >>>>>>>>>>>> alternative. Positional Deletes have a different performance >>>>>>>>>>>> profile inherently, requiring more upfront work proportional to >>>>>>>>>>>> the table size. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré >>>>>>>>>>>> <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote: >>>>>>>>>>>>> Hi Russell >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the nice writeup and the proposal. >>>>>>>>>>>>> >>>>>>>>>>>>> I agree with your analysis, and I have the same feeling. However, >>>>>>>>>>>>> I >>>>>>>>>>>>> think there are more than Flink that write equality delete files. >>>>>>>>>>>>> So, >>>>>>>>>>>>> I agree to deprecate in V3, but maybe be more "flexible" about >>>>>>>>>>>>> removal >>>>>>>>>>>>> in V4 in order to give time to engines to update. >>>>>>>>>>>>> I think that by deprecating equality deletes, we are clearly >>>>>>>>>>>>> focusing >>>>>>>>>>>>> on read performance and "consistency" (more than write). It's not >>>>>>>>>>>>> necessarily a bad thing but the streaming platform and data >>>>>>>>>>>>> ingestion >>>>>>>>>>>>> platforms will be probably concerned about that (by using >>>>>>>>>>>>> positional >>>>>>>>>>>>> deletes, they will have to scan/read all datafiles to find the >>>>>>>>>>>>> position, so painful). >>>>>>>>>>>>> >>>>>>>>>>>>> So, to summarize: >>>>>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to commit any >>>>>>>>>>>>> target >>>>>>>>>>>>> for deletion before having a clear path for streaming platforms >>>>>>>>>>>>> (Flink, Beam, ...) >>>>>>>>>>>>> 2. In the meantime (during the deprecation period), I propose to >>>>>>>>>>>>> explore possible improvements for streaming platforms (maybe >>>>>>>>>>>>> finding a >>>>>>>>>>>>> way to avoid full data files scan, ...) >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks ! >>>>>>>>>>>>> Regards >>>>>>>>>>>>> JB >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer >>>>>>>>>>>>> <russell.spit...@gmail.com <mailto:russell.spit...@gmail.com>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> > >>>>>>>>>>>>> > Background: >>>>>>>>>>>>> > >>>>>>>>>>>>> > 1) Position Deletes >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > Writers determine what rows are deleted and mark them in a 1 >>>>>>>>>>>>> > for 1 representation. With delete vectors this means every data >>>>>>>>>>>>> > file has at most 1 delete vector that it is read in conjunction >>>>>>>>>>>>> > with to excise deleted rows. Reader overhead is more or less >>>>>>>>>>>>> > constant and is very predictable. >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > The main cost of this mode is that deletes must be determined >>>>>>>>>>>>> > at write time which is expensive and can be more difficult for >>>>>>>>>>>>> > conflict resolution >>>>>>>>>>>>> > >>>>>>>>>>>>> > 2) Equality Deletes >>>>>>>>>>>>> > >>>>>>>>>>>>> > Writers write out reference to what values are deleted (in a >>>>>>>>>>>>> > partition or globally). There can be an unlimited number of >>>>>>>>>>>>> > equality deletes and they all must be checked for every data >>>>>>>>>>>>> > file that is read. The cost of determining deleted rows is >>>>>>>>>>>>> > essentially given to the reader. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Conflicts almost never happen since data files are not actually >>>>>>>>>>>>> > changed and there is almost no cost to the writer to generate >>>>>>>>>>>>> > these. Almost all costs related to equality deletes are passed >>>>>>>>>>>>> > on to the reader. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Proposal: >>>>>>>>>>>>> > >>>>>>>>>>>>> > Equality deletes are, in my opinion, unsustainable and we >>>>>>>>>>>>> > should work on deprecating and removing them from the >>>>>>>>>>>>> > specification. At this time, I know of only one engine (Apache >>>>>>>>>>>>> > Flink) which produces these deletes but almost all engines have >>>>>>>>>>>>> > implementations to read them. The cost of implementing equality >>>>>>>>>>>>> > deletes on the read path is difficult and unpredictable in >>>>>>>>>>>>> > terms of memory usage and compute complexity. We’ve had >>>>>>>>>>>>> > suggestions of implementing rocksdb inorder to handle ever >>>>>>>>>>>>> > growing sets of equality deletes which in my opinion shows that >>>>>>>>>>>>> > we are going down the wrong path. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Outside of performance, Equality deletes are also difficult to >>>>>>>>>>>>> > use in conjunction with many other features. For example, any >>>>>>>>>>>>> > features requiring CDC or Row lineage are basically impossible >>>>>>>>>>>>> > when equality deletes are in use. When Equality deletes are >>>>>>>>>>>>> > present, the state of the table can only be determined with a >>>>>>>>>>>>> > full scan making it difficult to update differential >>>>>>>>>>>>> > structures. This means materialized views or indexes need to >>>>>>>>>>>>> > essentially be fully rebuilt whenever an equality delete is >>>>>>>>>>>>> > added to the table. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Equality deletes essentially remove complexity from the write >>>>>>>>>>>>> > side but then add what I believe is an unacceptable level of >>>>>>>>>>>>> > complexity to the read side. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Because of this I suggest we deprecate Equality Deletes in V3 >>>>>>>>>>>>> > and slate them for full removal from the Iceberg Spec in V4. >>>>>>>>>>>>> > >>>>>>>>>>>>> > I know this is a big change and compatibility breakage so I >>>>>>>>>>>>> > would like to introduce this idea to the community and solicit >>>>>>>>>>>>> > feedback from all stakeholders. I am very flexible on this >>>>>>>>>>>>> > issue and would like to hear the best issues both for and >>>>>>>>>>>>> > against removal of Equality Deletes. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Thanks everyone for your time, >>>>>>>>>>>>> > >>>>>>>>>>>>> > Russ Spitzer >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> Jason Fine >>>>>>>>>>>> Chief Software Architect >>>>>>>>>>>> ja...@upsolver.com <mailto:ja...@upsolver.com> | www.upsolver.com >>>>>>>>>>>> <http://www.upsolver.com/>