Re: [DISCUSS] - Deprecate Equality Deletes

Péter Váry Mon, 04 Nov 2024 08:33:47 -0800

Well, it seems like I'm a little late, so most of the arguments are voiced.


I agree that we should not deprecate the equality deletes until we have a
replacement feature.
I think one of the big advantages of Iceberg is that it supports batch
processing and streaming ingestion too.
For streaming ingestion we need a way to update existing data in a
performant way, but restricting deletes for the primary keys seems like
enough from the streaming perspective.

Equality deletes allow a very wide range of applications, which we might be
able to narrow down a bit, but still keep useful. So if we want to go down
this road, we need to start collecting the requirements.

Thanks,
Peter

Shani Elharrar <sh...@upsolver.com.invalid> ezt írta (időpont: 2024. nov.
1., P, 19:22):

> I understand how it makes sense for batch jobs, but it damages stream
> jobs, using equality deletes works much better for streaming (which have a
> strict SLA for delays), and in order to decrease the performance penalty -
> systems can rewrite the equality deletes to positional deletes.
>
> Shani.
>
> On 1 Nov 2024, at 20:06, Steven Wu <stevenz...@gmail.com> wrote:
>
> 
> Fundamentally, it is very difficult to write position deletes with
> concurrent writers and conflicts for batch jobs too, as the inverted index
> may become invalid/stale.
>
> The position deletes are created during the write phase. But conflicts are
> only detected at the commit stage. I assume the batch job should fail in
> this case.
>
> On Fri, Nov 1, 2024 at 10:57 AM Steven Wu <stevenz...@gmail.com> wrote:
>
>> Shani,
>>
>> That is a good point. It is certainly a limitation for the Flink job to
>> track the inverted index internally (which is what I had in mind). It can't
>> be shared/synchronized with other Flink jobs or other engines writing to
>> the same table.
>>
>> Thanks,
>> Steven
>>
>> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar <sh...@upsolver.com.invalid>
>> wrote:
>>
>>> Even if Flink can create this state, it would have to be maintained
>>> against the Iceberg table, we wouldn't like duplicates (keys) if other
>>> systems / users update the table (e.g manual insert / updates using DML).
>>>
>>> Shani.
>>>
>>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com> wrote:
>>>
>>> 
>>> > Add support for inverted indexes to reduce the cost of position
>>> lookup. This is fairly tricky to implement for streaming use cases without
>>> an external system.
>>>
>>> Anton, that is also what I was saying earlier. In Flink, the inverted
>>> index of (key, committed data files) can be tracked in Flink state.
>>>
>>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi <aokolnyc...@gmail.com>
>>> wrote:
>>>
>>>> I was a bit skeptical when we were adding equality deletes, but nothing
>>>> beats their performance during writes. We have to find an alternative
>>>> before deprecating.
>>>>
>>>> We are doing a lot of work to improve streaming, like reducing the cost
>>>> of commits, enabling a large (potentially infinite) number of snapshots,
>>>> changelog reads, and so on. It is a project goal to excel in streaming.
>>>>
>>>> I was going to focus on equality deletes after completing the DV work.
>>>> I believe we have these options:
>>>>
>>>> - Revisit the existing design of equality deletes (e.g. add more
>>>> restrictions, improve compaction, offer new writers).
>>>> - Standardize on the view-based approach [1] to handle streaming
>>>> upserts and CDC use cases, potentially making this part of the spec.
>>>> - Add support for inverted indexes to reduce the cost of position
>>>> lookup. This is fairly tricky to implement for streaming use cases without
>>>> an external system. Our runtime filtering in Spark today is equivalent to
>>>> looking up positions in an inverted index represented by another Iceberg
>>>> table. That may still not be enough for some streaming use cases.
>>>>
>>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/
>>>>
>>>> - Anton
>>>>
>>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield <emkornfi...@gmail.com>
>>>> пише:
>>>>
>>>>> I agree that equality deletes have their place in streaming.  I think
>>>>> the ultimate decision here is how opinionated Iceberg wants to be on its
>>>>> use-cases.  If it really wants to stick to its origins of "slow moving
>>>>> data", then removing equality deletes would be inline with this.  I think
>>>>> the other high level question is how much we allow for partially 
>>>>> compatible
>>>>> features (the row lineage use-case feature was explicitly approved
>>>>> excluding equality deletes, and people seemed OK with it at the time.  If
>>>>> all features need to work together, then maybe we need to rethink the
>>>>> design here so it can be forward compatible with equality deletes).
>>>>>
>>>>> I think one issue with equality deletes as stated in the spec is that
>>>>> they are overly broad.  I'd be interested if people have any use cases 
>>>>> that
>>>>> differ, but I think one way of narrowing (and probably a necessary 
>>>>> building
>>>>> block for building something better)  the specification scope on equality
>>>>> deletes is to focus on upsert/Streaming deletes.  Two proposals in this
>>>>> regard are:
>>>>>
>>>>> 1.  Require that equality deletes can only correspond to unique
>>>>> identifiers for the table.
>>>>> 2.  Consider requiring that for equality deletes on partitioned
>>>>> tables, that the primary key must contain a partition column (I believe
>>>>> Flink at least already does this).  It is less clear to me that this would
>>>>> meet all existing use-cases.  But having this would allow for better
>>>>> incremental data-structures, which could then be partition based.
>>>>>
>>>>> Narrow scope to unique identifiers would allow for further building
>>>>> blocks already mentioned, like a secondary index (possible via LSM tree),
>>>>> that would allow for better performance overall.
>>>>>
>>>>> I generally agree with the sentiment that we shouldn't deprecate them
>>>>> until there is a viable replacement.  With all due respect to my employer,
>>>>> let's not fall into the Google trap [1] :)
>>>>>
>>>>> Cheers,
>>>>> Micah
>>>>>
>>>>> [1] https://goomics.net/50/
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo <
>>>>> alex...@starburstdata.com> wrote:
>>>>>
>>>>>> Hey all,
>>>>>>
>>>>>> Just to throw my 2 cents in, I agree with Steven and others that we
>>>>>> do need some kind of replacement before deprecating equality deletes.
>>>>>> They certainly have their problems, and do significantly increase
>>>>>> complexity as they are now, but the writing of position deletes is too
>>>>>> expensive for certain pipelines.
>>>>>>
>>>>>> We've been investigating using equality deletes for some of our
>>>>>> workloads at Starburst, the key advantage we were hoping to leverage is
>>>>>> cheap, effectively random access lookup deletes.
>>>>>> Say you have a UUID column that's unique in a table and want to
>>>>>> delete a row by UUID. With position deletes each delete is expensive
>>>>>> without an index on that UUID.
>>>>>> With equality deletes each delete is cheap and while reads/compaction
>>>>>> is expensive but when updates are frequent and reads are sporadic that's 
>>>>>> a
>>>>>> reasonable tradeoff.
>>>>>>
>>>>>> Pretty much what Jason and Steven have already said.
>>>>>>
>>>>>> Maybe there are some incremental improvements on equality deletes or
>>>>>> tips from similar systems that might alleviate some of their problems?
>>>>>>
>>>>>> - Alex Jo
>>>>>>
>>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu <stevenz...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> We probably all agree with the downside of equality deletes: it
>>>>>>> postpones all the work on the read path.
>>>>>>>
>>>>>>> In theory, we can implement position deletes only in the Flink
>>>>>>> streaming writer. It would require the tracking of last committed data
>>>>>>> files per key, which can be stored in Flink state (checkpointed). This 
>>>>>>> is
>>>>>>> obviously quite expensive/challenging, but possible.
>>>>>>>
>>>>>>> I like to echo one benefit of equality deletes that Russel called
>>>>>>> out in the original email. Equality deletes would never have conflicts.
>>>>>>> that is important for streaming writers (Flink, Kafka connect, ...) that
>>>>>>> commit frequently (minutes or less). Assume Flink can write position
>>>>>>> deletes only and commit every 2 minutes. The long-running nature of
>>>>>>> streaming jobs can cause frequent commit conflicts with background 
>>>>>>> delete
>>>>>>> compaction jobs.
>>>>>>>
>>>>>>> Overall, the streaming upsert write is not a well solved problem in
>>>>>>> Iceberg. This probably affects all streaming engines (Flink, Kafka 
>>>>>>> connect,
>>>>>>> Spark streaming, ...). We need to come up with some better alternatives
>>>>>>> before we can deprecate equality deletes.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer <
>>>>>>> russell.spit...@gmail.com> wrote:
>>>>>>>
>>>>>>>> For users of Equality Deletes, what are the key benefits to
>>>>>>>> Equality Deletes that you would like to preserve and could you please 
>>>>>>>> share
>>>>>>>> some concrete examples of the queries you want to run (and the schemas 
>>>>>>>> and
>>>>>>>> data sizes you would like to run them against) and the latencies that 
>>>>>>>> would
>>>>>>>> be acceptable?
>>>>>>>>
>>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine
>>>>>>>> <ja...@upsolver.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Representing Upsolver here, we also make use of Equality Deletes
>>>>>>>>> to deliver high frequency low latency updates to our clients at 
>>>>>>>>> scale. We
>>>>>>>>> have customers using them at scale and demonstrating the need and
>>>>>>>>> viability. We automate the process of converting them into positional
>>>>>>>>> deletes (or fully applying them) for more efficient engine queries in 
>>>>>>>>> the
>>>>>>>>> background giving our users both low latency and good query 
>>>>>>>>> performance.
>>>>>>>>>
>>>>>>>>> Equality Deletes were added since there isn't a good way to solve
>>>>>>>>> frequent updates otherwise. It would require some sort of index 
>>>>>>>>> keeping
>>>>>>>>> track of every record in the table (by a predetermined PK) and 
>>>>>>>>> maintaining
>>>>>>>>> such an index is a huge task that every tool interested in this would 
>>>>>>>>> need
>>>>>>>>> to re-implement. It also becomes a bottleneck limiting table sizes.
>>>>>>>>>
>>>>>>>>> I don't think they should be removed without providing an
>>>>>>>>> alternative. Positional Deletes have a different performance profile
>>>>>>>>> inherently, requiring more upfront work proportional to the table 
>>>>>>>>> size.
>>>>>>>>>
>>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré <
>>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Russell
>>>>>>>>>>
>>>>>>>>>> Thanks for the nice writeup and the proposal.
>>>>>>>>>>
>>>>>>>>>> I agree with your analysis, and I have the same feeling. However,
>>>>>>>>>> I
>>>>>>>>>> think there are more than Flink that write equality delete files.
>>>>>>>>>> So,
>>>>>>>>>> I agree to deprecate in V3, but maybe be more "flexible" about
>>>>>>>>>> removal
>>>>>>>>>> in V4 in order to give time to engines to update.
>>>>>>>>>> I think that by deprecating equality deletes, we are clearly
>>>>>>>>>> focusing
>>>>>>>>>> on read performance and "consistency" (more than write). It's not
>>>>>>>>>> necessarily a bad thing but the streaming platform and data
>>>>>>>>>> ingestion
>>>>>>>>>> platforms will be probably concerned about that (by using
>>>>>>>>>> positional
>>>>>>>>>> deletes, they will have to scan/read all datafiles to find the
>>>>>>>>>> position, so painful).
>>>>>>>>>>
>>>>>>>>>> So, to summarize:
>>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to commit any
>>>>>>>>>> target
>>>>>>>>>> for deletion before having a clear path for streaming platforms
>>>>>>>>>> (Flink, Beam, ...)
>>>>>>>>>> 2. In the meantime (during the deprecation period), I propose to
>>>>>>>>>> explore possible improvements for streaming platforms (maybe
>>>>>>>>>> finding a
>>>>>>>>>> way to avoid full data files scan, ...)
>>>>>>>>>>
>>>>>>>>>> Thanks !
>>>>>>>>>> Regards
>>>>>>>>>> JB
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer
>>>>>>>>>> <russell.spit...@gmail.com> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Background:
>>>>>>>>>> >
>>>>>>>>>> > 1) Position Deletes
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > Writers determine what rows are deleted and mark them in a 1
>>>>>>>>>> for 1 representation. With delete vectors this means every data file 
>>>>>>>>>> has at
>>>>>>>>>> most 1 delete vector that it is read in conjunction with to excise 
>>>>>>>>>> deleted
>>>>>>>>>> rows. Reader overhead is more or less constant and is very 
>>>>>>>>>> predictable.
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > The main cost of this mode is that deletes must be determined
>>>>>>>>>> at write time which is expensive and can be more difficult for 
>>>>>>>>>> conflict
>>>>>>>>>> resolution
>>>>>>>>>> >
>>>>>>>>>> > 2) Equality Deletes
>>>>>>>>>> >
>>>>>>>>>> > Writers write out reference to what values are deleted (in a
>>>>>>>>>> partition or globally). There can be an unlimited number of equality
>>>>>>>>>> deletes and they all must be checked for every data file that is 
>>>>>>>>>> read. The
>>>>>>>>>> cost of determining deleted rows is essentially given to the reader.
>>>>>>>>>> >
>>>>>>>>>> > Conflicts almost never happen since data files are not actually
>>>>>>>>>> changed and there is almost no cost to the writer to generate these. 
>>>>>>>>>> Almost
>>>>>>>>>> all costs related to equality deletes are passed on to the reader.
>>>>>>>>>> >
>>>>>>>>>> > Proposal:
>>>>>>>>>> >
>>>>>>>>>> > Equality deletes are, in my opinion, unsustainable and we
>>>>>>>>>> should work on deprecating and removing them from the specification. 
>>>>>>>>>> At
>>>>>>>>>> this time, I know of only one engine (Apache Flink) which produces 
>>>>>>>>>> these
>>>>>>>>>> deletes but almost all engines have implementations to read them. 
>>>>>>>>>> The cost
>>>>>>>>>> of implementing equality deletes on the read path is difficult and
>>>>>>>>>> unpredictable in terms of memory usage and compute complexity. We’ve 
>>>>>>>>>> had
>>>>>>>>>> suggestions of implementing rocksdb inorder to handle ever growing 
>>>>>>>>>> sets of
>>>>>>>>>> equality deletes which in my opinion shows that we are going down 
>>>>>>>>>> the wrong
>>>>>>>>>> path.
>>>>>>>>>> >
>>>>>>>>>> > Outside of performance, Equality deletes are also difficult to
>>>>>>>>>> use in conjunction with many other features. For example, any 
>>>>>>>>>> features
>>>>>>>>>> requiring CDC or Row lineage are basically impossible when equality 
>>>>>>>>>> deletes
>>>>>>>>>> are in use. When Equality deletes are present, the state of the 
>>>>>>>>>> table can
>>>>>>>>>> only be determined with a full scan making it difficult to update
>>>>>>>>>> differential structures. This means materialized views or indexes 
>>>>>>>>>> need to
>>>>>>>>>> essentially be fully rebuilt whenever an equality delete is added to 
>>>>>>>>>> the
>>>>>>>>>> table.
>>>>>>>>>> >
>>>>>>>>>> > Equality deletes essentially remove complexity from the write
>>>>>>>>>> side but then add what I believe is an unacceptable level of 
>>>>>>>>>> complexity to
>>>>>>>>>> the read side.
>>>>>>>>>> >
>>>>>>>>>> > Because of this I suggest we deprecate Equality Deletes in V3
>>>>>>>>>> and slate them for full removal from the Iceberg Spec in V4.
>>>>>>>>>> >
>>>>>>>>>> > I know this is a big change and compatibility breakage so I
>>>>>>>>>> would like to introduce this idea to the community and solicit 
>>>>>>>>>> feedback
>>>>>>>>>> from all stakeholders. I am very flexible on this issue and would 
>>>>>>>>>> like to
>>>>>>>>>> hear the best issues both for and against removal of Equality 
>>>>>>>>>> Deletes.
>>>>>>>>>> >
>>>>>>>>>> > Thanks everyone for your time,
>>>>>>>>>> >
>>>>>>>>>> > Russ Spitzer
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> *Jason Fine*
>>>>>>>>> Chief Software Architect
>>>>>>>>> ja...@upsolver.com  | www.upsolver.com
>>>>>>>>>
>>>>>>>>

Re: [DISCUSS] - Deprecate Equality Deletes

Reply via email to