Re: [DISCUSS] - Deprecate Equality Deletes

Anton Okolnychyi Fri, 01 Nov 2024 09:36:47 -0700

Steven, do you have any pointers? In particular, I am curious to learn
where the state will be stored, whether it will be distributed, the lookup
cost, how to incrementally maintain that index, etc.


- Anton

пт, 1 лист. 2024 р. о 17:32 Steven Wu <stevenz...@gmail.com> пише:

> > Add support for inverted indexes to reduce the cost of position lookup.
> This is fairly tricky to implement for streaming use cases without an
> external system.
>
> Anton, that is also what I was saying earlier. In Flink, the inverted
> index of (key, committed data files) can be tracked in Flink state.
>
> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi <aokolnyc...@gmail.com>
> wrote:
>
>> I was a bit skeptical when we were adding equality deletes, but nothing
>> beats their performance during writes. We have to find an alternative
>> before deprecating.
>>
>> We are doing a lot of work to improve streaming, like reducing the cost
>> of commits, enabling a large (potentially infinite) number of snapshots,
>> changelog reads, and so on. It is a project goal to excel in streaming.
>>
>> I was going to focus on equality deletes after completing the DV work. I
>> believe we have these options:
>>
>> - Revisit the existing design of equality deletes (e.g. add more
>> restrictions, improve compaction, offer new writers).
>> - Standardize on the view-based approach [1] to handle streaming upserts
>> and CDC use cases, potentially making this part of the spec.
>> - Add support for inverted indexes to reduce the cost of position lookup.
>> This is fairly tricky to implement for streaming use cases without an
>> external system. Our runtime filtering in Spark today is equivalent to
>> looking up positions in an inverted index represented by another Iceberg
>> table. That may still not be enough for some streaming use cases.
>>
>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/
>>
>> - Anton
>>
>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield <emkornfi...@gmail.com>
>> пише:
>>
>>> I agree that equality deletes have their place in streaming.  I think
>>> the ultimate decision here is how opinionated Iceberg wants to be on its
>>> use-cases.  If it really wants to stick to its origins of "slow moving
>>> data", then removing equality deletes would be inline with this.  I think
>>> the other high level question is how much we allow for partially compatible
>>> features (the row lineage use-case feature was explicitly approved
>>> excluding equality deletes, and people seemed OK with it at the time.  If
>>> all features need to work together, then maybe we need to rethink the
>>> design here so it can be forward compatible with equality deletes).
>>>
>>> I think one issue with equality deletes as stated in the spec is that
>>> they are overly broad.  I'd be interested if people have any use cases that
>>> differ, but I think one way of narrowing (and probably a necessary building
>>> block for building something better)  the specification scope on equality
>>> deletes is to focus on upsert/Streaming deletes.  Two proposals in this
>>> regard are:
>>>
>>> 1.  Require that equality deletes can only correspond to unique
>>> identifiers for the table.
>>> 2.  Consider requiring that for equality deletes on partitioned tables,
>>> that the primary key must contain a partition column (I believe Flink at
>>> least already does this).  It is less clear to me that this would meet all
>>> existing use-cases.  But having this would allow for better incremental
>>> data-structures, which could then be partition based.
>>>
>>> Narrow scope to unique identifiers would allow for further building
>>> blocks already mentioned, like a secondary index (possible via LSM tree),
>>> that would allow for better performance overall.
>>>
>>> I generally agree with the sentiment that we shouldn't deprecate them
>>> until there is a viable replacement.  With all due respect to my employer,
>>> let's not fall into the Google trap [1] :)
>>>
>>> Cheers,
>>> Micah
>>>
>>> [1] https://goomics.net/50/
>>>
>>>
>>>
>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo <alex...@starburstdata.com>
>>> wrote:
>>>
>>>> Hey all,
>>>>
>>>> Just to throw my 2 cents in, I agree with Steven and others that we do
>>>> need some kind of replacement before deprecating equality deletes.
>>>> They certainly have their problems, and do significantly increase
>>>> complexity as they are now, but the writing of position deletes is too
>>>> expensive for certain pipelines.
>>>>
>>>> We've been investigating using equality deletes for some of our
>>>> workloads at Starburst, the key advantage we were hoping to leverage is
>>>> cheap, effectively random access lookup deletes.
>>>> Say you have a UUID column that's unique in a table and want to delete
>>>> a row by UUID. With position deletes each delete is expensive without an
>>>> index on that UUID.
>>>> With equality deletes each delete is cheap and while reads/compaction
>>>> is expensive but when updates are frequent and reads are sporadic that's a
>>>> reasonable tradeoff.
>>>>
>>>> Pretty much what Jason and Steven have already said.
>>>>
>>>> Maybe there are some incremental improvements on equality deletes or
>>>> tips from similar systems that might alleviate some of their problems?
>>>>
>>>> - Alex Jo
>>>>
>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu <stevenz...@gmail.com>
>>>> wrote:
>>>>
>>>>> We probably all agree with the downside of equality deletes: it
>>>>> postpones all the work on the read path.
>>>>>
>>>>> In theory, we can implement position deletes only in the Flink
>>>>> streaming writer. It would require the tracking of last committed data
>>>>> files per key, which can be stored in Flink state (checkpointed). This is
>>>>> obviously quite expensive/challenging, but possible.
>>>>>
>>>>> I like to echo one benefit of equality deletes that Russel called out
>>>>> in the original email. Equality deletes would never have conflicts. that 
>>>>> is
>>>>> important for streaming writers (Flink, Kafka connect, ...) that
>>>>> commit frequently (minutes or less). Assume Flink can write position
>>>>> deletes only and commit every 2 minutes. The long-running nature of
>>>>> streaming jobs can cause frequent commit conflicts with background delete
>>>>> compaction jobs.
>>>>>
>>>>> Overall, the streaming upsert write is not a well solved problem in
>>>>> Iceberg. This probably affects all streaming engines (Flink, Kafka 
>>>>> connect,
>>>>> Spark streaming, ...). We need to come up with some better alternatives
>>>>> before we can deprecate equality deletes.
>>>>>
>>>>>
>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer <
>>>>> russell.spit...@gmail.com> wrote:
>>>>>
>>>>>> For users of Equality Deletes, what are the key benefits to Equality
>>>>>> Deletes that you would like to preserve and could you please share some
>>>>>> concrete examples of the queries you want to run (and the schemas and 
>>>>>> data
>>>>>> sizes you would like to run them against) and the latencies that would be
>>>>>> acceptable?
>>>>>>
>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine
>>>>>> <ja...@upsolver.com.invalid> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Representing Upsolver here, we also make use of Equality Deletes to
>>>>>>> deliver high frequency low latency updates to our clients at scale. We 
>>>>>>> have
>>>>>>> customers using them at scale and demonstrating the need and viability. 
>>>>>>> We
>>>>>>> automate the process of converting them into positional deletes (or 
>>>>>>> fully
>>>>>>> applying them) for more efficient engine queries in the background 
>>>>>>> giving
>>>>>>> our users both low latency and good query performance.
>>>>>>>
>>>>>>> Equality Deletes were added since there isn't a good way to solve
>>>>>>> frequent updates otherwise. It would require some sort of index keeping
>>>>>>> track of every record in the table (by a predetermined PK) and 
>>>>>>> maintaining
>>>>>>> such an index is a huge task that every tool interested in this would 
>>>>>>> need
>>>>>>> to re-implement. It also becomes a bottleneck limiting table sizes.
>>>>>>>
>>>>>>> I don't think they should be removed without providing an
>>>>>>> alternative. Positional Deletes have a different performance profile
>>>>>>> inherently, requiring more upfront work proportional to the table size.
>>>>>>>
>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré <
>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>
>>>>>>>> Hi Russell
>>>>>>>>
>>>>>>>> Thanks for the nice writeup and the proposal.
>>>>>>>>
>>>>>>>> I agree with your analysis, and I have the same feeling. However, I
>>>>>>>> think there are more than Flink that write equality delete files.
>>>>>>>> So,
>>>>>>>> I agree to deprecate in V3, but maybe be more "flexible" about
>>>>>>>> removal
>>>>>>>> in V4 in order to give time to engines to update.
>>>>>>>> I think that by deprecating equality deletes, we are clearly
>>>>>>>> focusing
>>>>>>>> on read performance and "consistency" (more than write). It's not
>>>>>>>> necessarily a bad thing but the streaming platform and data
>>>>>>>> ingestion
>>>>>>>> platforms will be probably concerned about that (by using positional
>>>>>>>> deletes, they will have to scan/read all datafiles to find the
>>>>>>>> position, so painful).
>>>>>>>>
>>>>>>>> So, to summarize:
>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to commit any target
>>>>>>>> for deletion before having a clear path for streaming platforms
>>>>>>>> (Flink, Beam, ...)
>>>>>>>> 2. In the meantime (during the deprecation period), I propose to
>>>>>>>> explore possible improvements for streaming platforms (maybe
>>>>>>>> finding a
>>>>>>>> way to avoid full data files scan, ...)
>>>>>>>>
>>>>>>>> Thanks !
>>>>>>>> Regards
>>>>>>>> JB
>>>>>>>>
>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer
>>>>>>>> <russell.spit...@gmail.com> wrote:
>>>>>>>> >
>>>>>>>> > Background:
>>>>>>>> >
>>>>>>>> > 1) Position Deletes
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Writers determine what rows are deleted and mark them in a 1 for
>>>>>>>> 1 representation. With delete vectors this means every data file has at
>>>>>>>> most 1 delete vector that it is read in conjunction with to excise 
>>>>>>>> deleted
>>>>>>>> rows. Reader overhead is more or less constant and is very predictable.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > The main cost of this mode is that deletes must be determined at
>>>>>>>> write time which is expensive and can be more difficult for conflict
>>>>>>>> resolution
>>>>>>>> >
>>>>>>>> > 2) Equality Deletes
>>>>>>>> >
>>>>>>>> > Writers write out reference to what values are deleted (in a
>>>>>>>> partition or globally). There can be an unlimited number of equality
>>>>>>>> deletes and they all must be checked for every data file that is read. 
>>>>>>>> The
>>>>>>>> cost of determining deleted rows is essentially given to the reader.
>>>>>>>> >
>>>>>>>> > Conflicts almost never happen since data files are not actually
>>>>>>>> changed and there is almost no cost to the writer to generate these. 
>>>>>>>> Almost
>>>>>>>> all costs related to equality deletes are passed on to the reader.
>>>>>>>> >
>>>>>>>> > Proposal:
>>>>>>>> >
>>>>>>>> > Equality deletes are, in my opinion, unsustainable and we should
>>>>>>>> work on deprecating and removing them from the specification. At this 
>>>>>>>> time,
>>>>>>>> I know of only one engine (Apache Flink) which produces these deletes 
>>>>>>>> but
>>>>>>>> almost all engines have implementations to read them. The cost of
>>>>>>>> implementing equality deletes on the read path is difficult and
>>>>>>>> unpredictable in terms of memory usage and compute complexity. We’ve 
>>>>>>>> had
>>>>>>>> suggestions of implementing rocksdb inorder to handle ever growing 
>>>>>>>> sets of
>>>>>>>> equality deletes which in my opinion shows that we are going down the 
>>>>>>>> wrong
>>>>>>>> path.
>>>>>>>> >
>>>>>>>> > Outside of performance, Equality deletes are also difficult to
>>>>>>>> use in conjunction with many other features. For example, any features
>>>>>>>> requiring CDC or Row lineage are basically impossible when equality 
>>>>>>>> deletes
>>>>>>>> are in use. When Equality deletes are present, the state of the table 
>>>>>>>> can
>>>>>>>> only be determined with a full scan making it difficult to update
>>>>>>>> differential structures. This means materialized views or indexes need 
>>>>>>>> to
>>>>>>>> essentially be fully rebuilt whenever an equality delete is added to 
>>>>>>>> the
>>>>>>>> table.
>>>>>>>> >
>>>>>>>> > Equality deletes essentially remove complexity from the write
>>>>>>>> side but then add what I believe is an unacceptable level of 
>>>>>>>> complexity to
>>>>>>>> the read side.
>>>>>>>> >
>>>>>>>> > Because of this I suggest we deprecate Equality Deletes in V3 and
>>>>>>>> slate them for full removal from the Iceberg Spec in V4.
>>>>>>>> >
>>>>>>>> > I know this is a big change and compatibility breakage so I would
>>>>>>>> like to introduce this idea to the community and solicit feedback from 
>>>>>>>> all
>>>>>>>> stakeholders. I am very flexible on this issue and would like to hear 
>>>>>>>> the
>>>>>>>> best issues both for and against removal of Equality Deletes.
>>>>>>>> >
>>>>>>>> > Thanks everyone for your time,
>>>>>>>> >
>>>>>>>> > Russ Spitzer
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> *Jason Fine*
>>>>>>> Chief Software Architect
>>>>>>> ja...@upsolver.com  | www.upsolver.com
>>>>>>>
>>>>>>

Re: [DISCUSS] - Deprecate Equality Deletes

Reply via email to