Re: [DISCUSS] - Deprecate Equality Deletes

Bryan Keller Tue, 05 Nov 2024 07:16:13 -0800

I also feel we should keep equality deletes until we have an alternative 
solution for streaming updates/deletes.


-Bryan

> On Nov 4, 2024, at 8:33 AM, Péter Váry <peter.vary.apa...@gmail.com> wrote:
> 
> Well, it seems like I'm a little late, so most of the arguments are voiced.
> 
> I agree that we should not deprecate the equality deletes until we have a 
> replacement feature.
> I think one of the big advantages of Iceberg is that it supports batch 
> processing and streaming ingestion too.
> For streaming ingestion we need a way to update existing data in a performant 
> way, but restricting deletes for the primary keys seems like enough from the 
> streaming perspective.
> 
> Equality deletes allow a very wide range of applications, which we might be 
> able to narrow down a bit, but still keep useful. So if we want to go down 
> this road, we need to start collecting the requirements.
> 
> Thanks,
> Peter
> 
> Shani Elharrar <sh...@upsolver.com.invalid> ezt írta (időpont: 2024. nov. 1., 
> P, 19:22):
>> I understand how it makes sense for batch jobs, but it damages stream jobs, 
>> using equality deletes works much better for streaming (which have a strict 
>> SLA for delays), and in order to decrease the performance penalty - systems 
>> can rewrite the equality deletes to positional deletes. 
>> 
>> Shani.
>> 
>>> On 1 Nov 2024, at 20:06, Steven Wu <stevenz...@gmail.com 
>>> <mailto:stevenz...@gmail.com>> wrote:
>>> 
>>> 
>>> Fundamentally, it is very difficult to write position deletes with 
>>> concurrent writers and conflicts for batch jobs too, as the inverted index 
>>> may become invalid/stale. 
>>> 
>>> The position deletes are created during the write phase. But conflicts are 
>>> only detected at the commit stage. I assume the batch job should fail in 
>>> this case.
>>> 
>>> On Fri, Nov 1, 2024 at 10:57 AM Steven Wu <stevenz...@gmail.com 
>>> <mailto:stevenz...@gmail.com>> wrote:
>>>> Shani,
>>>> 
>>>> That is a good point. It is certainly a limitation for the Flink job to 
>>>> track the inverted index internally (which is what I had in mind). It 
>>>> can't be shared/synchronized with other Flink jobs or other engines 
>>>> writing to the same table.
>>>> 
>>>> Thanks,
>>>> Steven
>>>> 
>>>> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar 
>>>> <sh...@upsolver.com.invalid> wrote:
>>>>> Even if Flink can create this state, it would have to be maintained 
>>>>> against the Iceberg table, we wouldn't like duplicates (keys) if other 
>>>>> systems / users update the table (e.g manual insert / updates using DML). 
>>>>> 
>>>>> Shani.
>>>>> 
>>>>>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com 
>>>>>> <mailto:stevenz...@gmail.com>> wrote:
>>>>>> 
>>>>>> 
>>>>>> > Add support for inverted indexes to reduce the cost of position 
>>>>>> > lookup. This is fairly tricky to implement for streaming use cases 
>>>>>> > without an external system.
>>>>>> 
>>>>>> Anton, that is also what I was saying earlier. In Flink, the inverted 
>>>>>> index of (key, committed data files) can be tracked in Flink state.
>>>>>> 
>>>>>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi <aokolnyc...@gmail.com 
>>>>>> <mailto:aokolnyc...@gmail.com>> wrote:
>>>>>>> I was a bit skeptical when we were adding equality deletes, but nothing 
>>>>>>> beats their performance during writes. We have to find an alternative 
>>>>>>> before deprecating.
>>>>>>> 
>>>>>>> We are doing a lot of work to improve streaming, like reducing the cost 
>>>>>>> of commits, enabling a large (potentially infinite) number of 
>>>>>>> snapshots, changelog reads, and so on. It is a project goal to excel in 
>>>>>>> streaming.
>>>>>>> 
>>>>>>> I was going to focus on equality deletes after completing the DV work. 
>>>>>>> I believe we have these options:
>>>>>>> 
>>>>>>> - Revisit the existing design of equality deletes (e.g. add more 
>>>>>>> restrictions, improve compaction, offer new writers).
>>>>>>> - Standardize on the view-based approach [1] to handle streaming 
>>>>>>> upserts and CDC use cases, potentially making this part of the spec.
>>>>>>> - Add support for inverted indexes to reduce the cost of position 
>>>>>>> lookup. This is fairly tricky to implement for streaming use cases 
>>>>>>> without an external system. Our runtime filtering in Spark today is 
>>>>>>> equivalent to looking up positions in an inverted index represented by 
>>>>>>> another Iceberg table. That may still not be enough for some streaming 
>>>>>>> use cases.
>>>>>>> 
>>>>>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/
>>>>>>> 
>>>>>>> - Anton
>>>>>>> 
>>>>>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield <emkornfi...@gmail.com 
>>>>>>> <mailto:emkornfi...@gmail.com>> пише:
>>>>>>>> I agree that equality deletes have their place in streaming.  I think 
>>>>>>>> the ultimate decision here is how opinionated Iceberg wants to be on 
>>>>>>>> its use-cases.  If it really wants to stick to its origins of "slow 
>>>>>>>> moving data", then removing equality deletes would be inline with 
>>>>>>>> this.  I think the other high level question is how much we allow for 
>>>>>>>> partially compatible features (the row lineage use-case feature was 
>>>>>>>> explicitly approved excluding equality deletes, and people seemed OK 
>>>>>>>> with it at the time.  If all features need to work together, then 
>>>>>>>> maybe we need to rethink the design here so it can be forward 
>>>>>>>> compatible with equality deletes).
>>>>>>>> 
>>>>>>>> I think one issue with equality deletes as stated in the spec is that 
>>>>>>>> they are overly broad.  I'd be interested if people have any use cases 
>>>>>>>> that differ, but I think one way of narrowing (and probably a 
>>>>>>>> necessary building block for building something better)  the 
>>>>>>>> specification scope on equality deletes is to focus on 
>>>>>>>> upsert/Streaming deletes.  Two proposals in this regard are:
>>>>>>>> 
>>>>>>>> 1.  Require that equality deletes can only correspond to unique 
>>>>>>>> identifiers for the table.
>>>>>>>> 2.  Consider requiring that for equality deletes on partitioned 
>>>>>>>> tables, that the primary key must contain a partition column (I 
>>>>>>>> believe Flink at least already does this).  It is less clear to me 
>>>>>>>> that this would meet all existing use-cases.  But having this would 
>>>>>>>> allow for better incremental data-structures, which could then be 
>>>>>>>> partition based.
>>>>>>>> 
>>>>>>>> Narrow scope to unique identifiers would allow for further building 
>>>>>>>> blocks already mentioned, like a secondary index (possible via LSM 
>>>>>>>> tree), that would allow for better performance overall.
>>>>>>>> 
>>>>>>>> I generally agree with the sentiment that we shouldn't deprecate them 
>>>>>>>> until there is a viable replacement.  With all due respect to my 
>>>>>>>> employer, let's not fall into the Google trap [1] :) 
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Micah
>>>>>>>> 
>>>>>>>> [1] https://goomics.net/50/
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo 
>>>>>>>> <alex...@starburstdata.com <mailto:alex...@starburstdata.com>> wrote:
>>>>>>>>> Hey all,
>>>>>>>>> 
>>>>>>>>> Just to throw my 2 cents in, I agree with Steven and others that we 
>>>>>>>>> do need some kind of replacement before deprecating equality deletes.
>>>>>>>>> They certainly have their problems, and do significantly increase 
>>>>>>>>> complexity as they are now, but the writing of position deletes is 
>>>>>>>>> too expensive for certain pipelines.
>>>>>>>>> 
>>>>>>>>> We've been investigating using equality deletes for some of our 
>>>>>>>>> workloads at Starburst, the key advantage we were hoping to leverage 
>>>>>>>>> is cheap, effectively random access lookup deletes.
>>>>>>>>> Say you have a UUID column that's unique in a table and want to 
>>>>>>>>> delete a row by UUID. With position deletes each delete is expensive 
>>>>>>>>> without an index on that UUID. 
>>>>>>>>> With equality deletes each delete is cheap and while reads/compaction 
>>>>>>>>> is expensive but when updates are frequent and reads are sporadic 
>>>>>>>>> that's a reasonable tradeoff.
>>>>>>>>> 
>>>>>>>>> Pretty much what Jason and Steven have already said. 
>>>>>>>>> 
>>>>>>>>> Maybe there are some incremental improvements on equality deletes or 
>>>>>>>>> tips from similar systems that might alleviate some of their problems?
>>>>>>>>> 
>>>>>>>>> - Alex Jo
>>>>>>>>> 
>>>>>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu <stevenz...@gmail.com 
>>>>>>>>> <mailto:stevenz...@gmail.com>> wrote:
>>>>>>>>>> We probably all agree with the downside of equality deletes: it 
>>>>>>>>>> postpones all the work on the read path.
>>>>>>>>>> 
>>>>>>>>>> In theory, we can implement position deletes only in the Flink 
>>>>>>>>>> streaming writer. It would require the tracking of last committed 
>>>>>>>>>> data files per key, which can be stored in Flink state 
>>>>>>>>>> (checkpointed). This is obviously quite expensive/challenging, but 
>>>>>>>>>> possible. 
>>>>>>>>>> 
>>>>>>>>>> I like to echo one benefit of equality deletes that Russel called 
>>>>>>>>>> out in the original email. Equality deletes would never have 
>>>>>>>>>> conflicts. that is important for streaming writers (Flink, Kafka 
>>>>>>>>>> connect, ...) that commit frequently (minutes or less). Assume Flink 
>>>>>>>>>> can write position deletes only and commit every 2 minutes. The 
>>>>>>>>>> long-running nature of streaming jobs can cause frequent commit 
>>>>>>>>>> conflicts with background delete compaction jobs.
>>>>>>>>>> 
>>>>>>>>>> Overall, the streaming upsert write is not a well solved problem in 
>>>>>>>>>> Iceberg. This probably affects all streaming engines (Flink, Kafka 
>>>>>>>>>> connect, Spark streaming, ...). We need to come up with some better 
>>>>>>>>>> alternatives before we can deprecate equality deletes.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer 
>>>>>>>>>> <russell.spit...@gmail.com <mailto:russell.spit...@gmail.com>> wrote:
>>>>>>>>>>> For users of Equality Deletes, what are the key benefits to 
>>>>>>>>>>> Equality Deletes that you would like to preserve and could you 
>>>>>>>>>>> please share some concrete examples of the queries you want to run 
>>>>>>>>>>> (and the schemas and data sizes you would like to run them against) 
>>>>>>>>>>> and the latencies that would be acceptable?
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine 
>>>>>>>>>>> <ja...@upsolver.com.invalid> wrote:
>>>>>>>>>>>> Hi, 
>>>>>>>>>>>> 
>>>>>>>>>>>> Representing Upsolver here, we also make use of Equality Deletes 
>>>>>>>>>>>> to deliver high frequency low latency updates to our clients at 
>>>>>>>>>>>> scale. We have customers using them at scale and demonstrating the 
>>>>>>>>>>>> need and viability. We automate the process of converting them 
>>>>>>>>>>>> into positional deletes (or fully applying them) for more 
>>>>>>>>>>>> efficient engine queries in the background giving our users both 
>>>>>>>>>>>> low latency and good query performance. 
>>>>>>>>>>>> 
>>>>>>>>>>>> Equality Deletes were added since there isn't a good way to solve 
>>>>>>>>>>>> frequent updates otherwise. It would require some sort of index 
>>>>>>>>>>>> keeping track of every record in the table (by a predetermined PK) 
>>>>>>>>>>>> and maintaining such an index is a huge task that every tool 
>>>>>>>>>>>> interested in this would need to re-implement. It also becomes a 
>>>>>>>>>>>> bottleneck limiting table sizes.
>>>>>>>>>>>> 
>>>>>>>>>>>> I don't think they should be removed without providing an 
>>>>>>>>>>>> alternative. Positional Deletes have a different performance 
>>>>>>>>>>>> profile inherently, requiring more upfront work proportional to 
>>>>>>>>>>>> the table size. 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré 
>>>>>>>>>>>> <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote:
>>>>>>>>>>>>> Hi Russell
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for the nice writeup and the proposal.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I agree with your analysis, and I have the same feeling. However, 
>>>>>>>>>>>>> I
>>>>>>>>>>>>> think there are more than Flink that write equality delete files. 
>>>>>>>>>>>>> So,
>>>>>>>>>>>>> I agree to deprecate in V3, but maybe be more "flexible" about 
>>>>>>>>>>>>> removal
>>>>>>>>>>>>> in V4 in order to give time to engines to update.
>>>>>>>>>>>>> I think that by deprecating equality deletes, we are clearly 
>>>>>>>>>>>>> focusing
>>>>>>>>>>>>> on read performance and "consistency" (more than write). It's not
>>>>>>>>>>>>> necessarily a bad thing but the streaming platform and data 
>>>>>>>>>>>>> ingestion
>>>>>>>>>>>>> platforms will be probably concerned about that (by using 
>>>>>>>>>>>>> positional
>>>>>>>>>>>>> deletes, they will have to scan/read all datafiles to find the
>>>>>>>>>>>>> position, so painful).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So, to summarize:
>>>>>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to commit any 
>>>>>>>>>>>>> target
>>>>>>>>>>>>> for deletion before having a clear path for streaming platforms
>>>>>>>>>>>>> (Flink, Beam, ...)
>>>>>>>>>>>>> 2. In the meantime (during the deprecation period), I propose to
>>>>>>>>>>>>> explore possible improvements for streaming platforms (maybe 
>>>>>>>>>>>>> finding a
>>>>>>>>>>>>> way to avoid full data files scan, ...)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks !
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> JB
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer
>>>>>>>>>>>>> <russell.spit...@gmail.com <mailto:russell.spit...@gmail.com>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Background:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > 1) Position Deletes
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Writers determine what rows are deleted and mark them in a 1 
>>>>>>>>>>>>> > for 1 representation. With delete vectors this means every data 
>>>>>>>>>>>>> > file has at most 1 delete vector that it is read in conjunction 
>>>>>>>>>>>>> > with to excise deleted rows. Reader overhead is more or less 
>>>>>>>>>>>>> > constant and is very predictable.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > The main cost of this mode is that deletes must be determined 
>>>>>>>>>>>>> > at write time which is expensive and can be more difficult for 
>>>>>>>>>>>>> > conflict resolution
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > 2) Equality Deletes
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Writers write out reference to what values are deleted (in a 
>>>>>>>>>>>>> > partition or globally). There can be an unlimited number of 
>>>>>>>>>>>>> > equality deletes and they all must be checked for every data 
>>>>>>>>>>>>> > file that is read. The cost of determining deleted rows is 
>>>>>>>>>>>>> > essentially given to the reader.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Conflicts almost never happen since data files are not actually 
>>>>>>>>>>>>> > changed and there is almost no cost to the writer to generate 
>>>>>>>>>>>>> > these. Almost all costs related to equality deletes are passed 
>>>>>>>>>>>>> > on to the reader.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Proposal:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Equality deletes are, in my opinion, unsustainable and we 
>>>>>>>>>>>>> > should work on deprecating and removing them from the 
>>>>>>>>>>>>> > specification. At this time, I know of only one engine (Apache 
>>>>>>>>>>>>> > Flink) which produces these deletes but almost all engines have 
>>>>>>>>>>>>> > implementations to read them. The cost of implementing equality 
>>>>>>>>>>>>> > deletes on the read path is difficult and unpredictable in 
>>>>>>>>>>>>> > terms of memory usage and compute complexity. We’ve had 
>>>>>>>>>>>>> > suggestions of implementing rocksdb inorder to handle ever 
>>>>>>>>>>>>> > growing sets of equality deletes which in my opinion shows that 
>>>>>>>>>>>>> > we are going down the wrong path.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Outside of performance, Equality deletes are also difficult to 
>>>>>>>>>>>>> > use in conjunction with many other features. For example, any 
>>>>>>>>>>>>> > features requiring CDC or Row lineage are basically impossible 
>>>>>>>>>>>>> > when equality deletes are in use. When Equality deletes are 
>>>>>>>>>>>>> > present, the state of the table can only be determined with a 
>>>>>>>>>>>>> > full scan making it difficult to update differential 
>>>>>>>>>>>>> > structures. This means materialized views or indexes need to 
>>>>>>>>>>>>> > essentially be fully rebuilt whenever an equality delete is 
>>>>>>>>>>>>> > added to the table.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Equality deletes essentially remove complexity from the write 
>>>>>>>>>>>>> > side but then add what I believe is an unacceptable level of 
>>>>>>>>>>>>> > complexity to the read side.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Because of this I suggest we deprecate Equality Deletes in V3 
>>>>>>>>>>>>> > and slate them for full removal from the Iceberg Spec in V4.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > I know this is a big change and compatibility breakage so I 
>>>>>>>>>>>>> > would like to introduce this idea to the community and solicit 
>>>>>>>>>>>>> > feedback from all stakeholders. I am very flexible on this 
>>>>>>>>>>>>> > issue and would like to hear the best issues both for and 
>>>>>>>>>>>>> > against removal of Equality Deletes.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Thanks everyone for your time,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Russ Spitzer
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> 
>>>>>>>>>>>> Jason Fine
>>>>>>>>>>>> Chief Software Architect
>>>>>>>>>>>> ja...@upsolver.com <mailto:ja...@upsolver.com>  | www.upsolver.com 
>>>>>>>>>>>> <http://www.upsolver.com/>

Re: [DISCUSS] - Deprecate Equality Deletes

Reply via email to