Fundamentally, it is very difficult to write position deletes with
concurrent writers and conflicts for batch jobs too, as the inverted index
may become invalid/stale.

The position deletes are created during the write phase. But conflicts are
only detected at the commit stage. I assume the batch job should fail in
this case.

On Fri, Nov 1, 2024 at 10:57 AM Steven Wu <stevenz...@gmail.com> wrote:

> Shani,
>
> That is a good point. It is certainly a limitation for the Flink job to
> track the inverted index internally (which is what I had in mind). It can't
> be shared/synchronized with other Flink jobs or other engines writing to
> the same table.
>
> Thanks,
> Steven
>
> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar <sh...@upsolver.com.invalid>
> wrote:
>
>> Even if Flink can create this state, it would have to be maintained
>> against the Iceberg table, we wouldn't like duplicates (keys) if other
>> systems / users update the table (e.g manual insert / updates using DML).
>>
>> Shani.
>>
>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com> wrote:
>>
>> 
>> > Add support for inverted indexes to reduce the cost of position lookup.
>> This is fairly tricky to implement for streaming use cases without an
>> external system.
>>
>> Anton, that is also what I was saying earlier. In Flink, the inverted
>> index of (key, committed data files) can be tracked in Flink state.
>>
>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi <aokolnyc...@gmail.com>
>> wrote:
>>
>>> I was a bit skeptical when we were adding equality deletes, but nothing
>>> beats their performance during writes. We have to find an alternative
>>> before deprecating.
>>>
>>> We are doing a lot of work to improve streaming, like reducing the cost
>>> of commits, enabling a large (potentially infinite) number of snapshots,
>>> changelog reads, and so on. It is a project goal to excel in streaming.
>>>
>>> I was going to focus on equality deletes after completing the DV work. I
>>> believe we have these options:
>>>
>>> - Revisit the existing design of equality deletes (e.g. add more
>>> restrictions, improve compaction, offer new writers).
>>> - Standardize on the view-based approach [1] to handle streaming upserts
>>> and CDC use cases, potentially making this part of the spec.
>>> - Add support for inverted indexes to reduce the cost of position
>>> lookup. This is fairly tricky to implement for streaming use cases without
>>> an external system. Our runtime filtering in Spark today is equivalent to
>>> looking up positions in an inverted index represented by another Iceberg
>>> table. That may still not be enough for some streaming use cases.
>>>
>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/
>>>
>>> - Anton
>>>
>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield <emkornfi...@gmail.com>
>>> пише:
>>>
>>>> I agree that equality deletes have their place in streaming.  I think
>>>> the ultimate decision here is how opinionated Iceberg wants to be on its
>>>> use-cases.  If it really wants to stick to its origins of "slow moving
>>>> data", then removing equality deletes would be inline with this.  I think
>>>> the other high level question is how much we allow for partially compatible
>>>> features (the row lineage use-case feature was explicitly approved
>>>> excluding equality deletes, and people seemed OK with it at the time.  If
>>>> all features need to work together, then maybe we need to rethink the
>>>> design here so it can be forward compatible with equality deletes).
>>>>
>>>> I think one issue with equality deletes as stated in the spec is that
>>>> they are overly broad.  I'd be interested if people have any use cases that
>>>> differ, but I think one way of narrowing (and probably a necessary building
>>>> block for building something better)  the specification scope on equality
>>>> deletes is to focus on upsert/Streaming deletes.  Two proposals in this
>>>> regard are:
>>>>
>>>> 1.  Require that equality deletes can only correspond to unique
>>>> identifiers for the table.
>>>> 2.  Consider requiring that for equality deletes on partitioned tables,
>>>> that the primary key must contain a partition column (I believe Flink at
>>>> least already does this).  It is less clear to me that this would meet all
>>>> existing use-cases.  But having this would allow for better incremental
>>>> data-structures, which could then be partition based.
>>>>
>>>> Narrow scope to unique identifiers would allow for further building
>>>> blocks already mentioned, like a secondary index (possible via LSM tree),
>>>> that would allow for better performance overall.
>>>>
>>>> I generally agree with the sentiment that we shouldn't deprecate them
>>>> until there is a viable replacement.  With all due respect to my employer,
>>>> let's not fall into the Google trap [1] :)
>>>>
>>>> Cheers,
>>>> Micah
>>>>
>>>> [1] https://goomics.net/50/
>>>>
>>>>
>>>>
>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo <
>>>> alex...@starburstdata.com> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>> Just to throw my 2 cents in, I agree with Steven and others that we do
>>>>> need some kind of replacement before deprecating equality deletes.
>>>>> They certainly have their problems, and do significantly increase
>>>>> complexity as they are now, but the writing of position deletes is too
>>>>> expensive for certain pipelines.
>>>>>
>>>>> We've been investigating using equality deletes for some of our
>>>>> workloads at Starburst, the key advantage we were hoping to leverage is
>>>>> cheap, effectively random access lookup deletes.
>>>>> Say you have a UUID column that's unique in a table and want to delete
>>>>> a row by UUID. With position deletes each delete is expensive without an
>>>>> index on that UUID.
>>>>> With equality deletes each delete is cheap and while reads/compaction
>>>>> is expensive but when updates are frequent and reads are sporadic that's a
>>>>> reasonable tradeoff.
>>>>>
>>>>> Pretty much what Jason and Steven have already said.
>>>>>
>>>>> Maybe there are some incremental improvements on equality deletes or
>>>>> tips from similar systems that might alleviate some of their problems?
>>>>>
>>>>> - Alex Jo
>>>>>
>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu <stevenz...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> We probably all agree with the downside of equality deletes: it
>>>>>> postpones all the work on the read path.
>>>>>>
>>>>>> In theory, we can implement position deletes only in the Flink
>>>>>> streaming writer. It would require the tracking of last committed data
>>>>>> files per key, which can be stored in Flink state (checkpointed). This is
>>>>>> obviously quite expensive/challenging, but possible.
>>>>>>
>>>>>> I like to echo one benefit of equality deletes that Russel called out
>>>>>> in the original email. Equality deletes would never have conflicts. that 
>>>>>> is
>>>>>> important for streaming writers (Flink, Kafka connect, ...) that
>>>>>> commit frequently (minutes or less). Assume Flink can write position
>>>>>> deletes only and commit every 2 minutes. The long-running nature of
>>>>>> streaming jobs can cause frequent commit conflicts with background delete
>>>>>> compaction jobs.
>>>>>>
>>>>>> Overall, the streaming upsert write is not a well solved problem in
>>>>>> Iceberg. This probably affects all streaming engines (Flink, Kafka 
>>>>>> connect,
>>>>>> Spark streaming, ...). We need to come up with some better alternatives
>>>>>> before we can deprecate equality deletes.
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer <
>>>>>> russell.spit...@gmail.com> wrote:
>>>>>>
>>>>>>> For users of Equality Deletes, what are the key benefits to Equality
>>>>>>> Deletes that you would like to preserve and could you please share some
>>>>>>> concrete examples of the queries you want to run (and the schemas and 
>>>>>>> data
>>>>>>> sizes you would like to run them against) and the latencies that would 
>>>>>>> be
>>>>>>> acceptable?
>>>>>>>
>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine
>>>>>>> <ja...@upsolver.com.invalid> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Representing Upsolver here, we also make use of Equality Deletes to
>>>>>>>> deliver high frequency low latency updates to our clients at scale. We 
>>>>>>>> have
>>>>>>>> customers using them at scale and demonstrating the need and 
>>>>>>>> viability. We
>>>>>>>> automate the process of converting them into positional deletes (or 
>>>>>>>> fully
>>>>>>>> applying them) for more efficient engine queries in the background 
>>>>>>>> giving
>>>>>>>> our users both low latency and good query performance.
>>>>>>>>
>>>>>>>> Equality Deletes were added since there isn't a good way to solve
>>>>>>>> frequent updates otherwise. It would require some sort of index keeping
>>>>>>>> track of every record in the table (by a predetermined PK) and 
>>>>>>>> maintaining
>>>>>>>> such an index is a huge task that every tool interested in this would 
>>>>>>>> need
>>>>>>>> to re-implement. It also becomes a bottleneck limiting table sizes.
>>>>>>>>
>>>>>>>> I don't think they should be removed without providing an
>>>>>>>> alternative. Positional Deletes have a different performance profile
>>>>>>>> inherently, requiring more upfront work proportional to the table size.
>>>>>>>>
>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré <
>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>
>>>>>>>>> Hi Russell
>>>>>>>>>
>>>>>>>>> Thanks for the nice writeup and the proposal.
>>>>>>>>>
>>>>>>>>> I agree with your analysis, and I have the same feeling. However, I
>>>>>>>>> think there are more than Flink that write equality delete files.
>>>>>>>>> So,
>>>>>>>>> I agree to deprecate in V3, but maybe be more "flexible" about
>>>>>>>>> removal
>>>>>>>>> in V4 in order to give time to engines to update.
>>>>>>>>> I think that by deprecating equality deletes, we are clearly
>>>>>>>>> focusing
>>>>>>>>> on read performance and "consistency" (more than write). It's not
>>>>>>>>> necessarily a bad thing but the streaming platform and data
>>>>>>>>> ingestion
>>>>>>>>> platforms will be probably concerned about that (by using
>>>>>>>>> positional
>>>>>>>>> deletes, they will have to scan/read all datafiles to find the
>>>>>>>>> position, so painful).
>>>>>>>>>
>>>>>>>>> So, to summarize:
>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to commit any target
>>>>>>>>> for deletion before having a clear path for streaming platforms
>>>>>>>>> (Flink, Beam, ...)
>>>>>>>>> 2. In the meantime (during the deprecation period), I propose to
>>>>>>>>> explore possible improvements for streaming platforms (maybe
>>>>>>>>> finding a
>>>>>>>>> way to avoid full data files scan, ...)
>>>>>>>>>
>>>>>>>>> Thanks !
>>>>>>>>> Regards
>>>>>>>>> JB
>>>>>>>>>
>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer
>>>>>>>>> <russell.spit...@gmail.com> wrote:
>>>>>>>>> >
>>>>>>>>> > Background:
>>>>>>>>> >
>>>>>>>>> > 1) Position Deletes
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Writers determine what rows are deleted and mark them in a 1 for
>>>>>>>>> 1 representation. With delete vectors this means every data file has 
>>>>>>>>> at
>>>>>>>>> most 1 delete vector that it is read in conjunction with to excise 
>>>>>>>>> deleted
>>>>>>>>> rows. Reader overhead is more or less constant and is very 
>>>>>>>>> predictable.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > The main cost of this mode is that deletes must be determined at
>>>>>>>>> write time which is expensive and can be more difficult for conflict
>>>>>>>>> resolution
>>>>>>>>> >
>>>>>>>>> > 2) Equality Deletes
>>>>>>>>> >
>>>>>>>>> > Writers write out reference to what values are deleted (in a
>>>>>>>>> partition or globally). There can be an unlimited number of equality
>>>>>>>>> deletes and they all must be checked for every data file that is 
>>>>>>>>> read. The
>>>>>>>>> cost of determining deleted rows is essentially given to the reader.
>>>>>>>>> >
>>>>>>>>> > Conflicts almost never happen since data files are not actually
>>>>>>>>> changed and there is almost no cost to the writer to generate these. 
>>>>>>>>> Almost
>>>>>>>>> all costs related to equality deletes are passed on to the reader.
>>>>>>>>> >
>>>>>>>>> > Proposal:
>>>>>>>>> >
>>>>>>>>> > Equality deletes are, in my opinion, unsustainable and we should
>>>>>>>>> work on deprecating and removing them from the specification. At this 
>>>>>>>>> time,
>>>>>>>>> I know of only one engine (Apache Flink) which produces these deletes 
>>>>>>>>> but
>>>>>>>>> almost all engines have implementations to read them. The cost of
>>>>>>>>> implementing equality deletes on the read path is difficult and
>>>>>>>>> unpredictable in terms of memory usage and compute complexity. We’ve 
>>>>>>>>> had
>>>>>>>>> suggestions of implementing rocksdb inorder to handle ever growing 
>>>>>>>>> sets of
>>>>>>>>> equality deletes which in my opinion shows that we are going down the 
>>>>>>>>> wrong
>>>>>>>>> path.
>>>>>>>>> >
>>>>>>>>> > Outside of performance, Equality deletes are also difficult to
>>>>>>>>> use in conjunction with many other features. For example, any features
>>>>>>>>> requiring CDC or Row lineage are basically impossible when equality 
>>>>>>>>> deletes
>>>>>>>>> are in use. When Equality deletes are present, the state of the table 
>>>>>>>>> can
>>>>>>>>> only be determined with a full scan making it difficult to update
>>>>>>>>> differential structures. This means materialized views or indexes 
>>>>>>>>> need to
>>>>>>>>> essentially be fully rebuilt whenever an equality delete is added to 
>>>>>>>>> the
>>>>>>>>> table.
>>>>>>>>> >
>>>>>>>>> > Equality deletes essentially remove complexity from the write
>>>>>>>>> side but then add what I believe is an unacceptable level of 
>>>>>>>>> complexity to
>>>>>>>>> the read side.
>>>>>>>>> >
>>>>>>>>> > Because of this I suggest we deprecate Equality Deletes in V3
>>>>>>>>> and slate them for full removal from the Iceberg Spec in V4.
>>>>>>>>> >
>>>>>>>>> > I know this is a big change and compatibility breakage so I
>>>>>>>>> would like to introduce this idea to the community and solicit 
>>>>>>>>> feedback
>>>>>>>>> from all stakeholders. I am very flexible on this issue and would 
>>>>>>>>> like to
>>>>>>>>> hear the best issues both for and against removal of Equality Deletes.
>>>>>>>>> >
>>>>>>>>> > Thanks everyone for your time,
>>>>>>>>> >
>>>>>>>>> > Russ Spitzer
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> *Jason Fine*
>>>>>>>> Chief Software Architect
>>>>>>>> ja...@upsolver.com  | www.upsolver.com
>>>>>>>>
>>>>>>>

Reply via email to