>
> - ok for deprecate equality deletes
> - not ok to remove it

@JB: I don't think it is a good idea to use deprecated functionality in the
new feature development.
Hence, my specific question was about kafka connect upsert operation.

@Manu: I meant the delta writers for kafka connect Iceberg sink (which in
turn used for upsetting the CDC records)
https://github.com/apache/iceberg/issues/10842

- Ajantha



On Tue, Nov 19, 2024 at 3:08 PM Manu Zhang <owenzhang1...@gmail.com> wrote:

> I second Anton's proposal to standardize on a view-based approach to
> handle CDC cases.
> Actually, it's already been explored in detail[1] by Jack before.
>
> [1] Improving Change Data Capture Use Case for Apache Iceberg
> <https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit?tab=t.0#heading=h.94xnx4qg3bnt>
>
>
> On Tue, Nov 19, 2024 at 4:16 PM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> My proposal is the following (already expressed):
>> - ok for deprecate equality deletes
>> - not ok to remove it
>> - work on position deletes improvements to address streaming use cases. I
>> think we should explore different approaches. Personally I think a possible
>> approach would be to find index way to data files to avoid full scan to
>> find row position.
>>
>> My $0.01 :)
>>
>> Regards
>> JB
>>
>> Le mar. 19 nov. 2024 à 07:53, Ajantha Bhat <ajanthab...@gmail.com> a
>> écrit :
>>
>>> Hi, What's the conclusion on this thread?
>>>
>>> Users are looking for Upsert (CDC) support for OSS Iceberg kafka connect
>>> sink.
>>> We only support appends at the moment. Can we go ahead and implement the
>>> upserts using equality deletes?
>>>
>>>
>>> - Ajantha
>>>
>>> On Sun, Nov 10, 2024 at 11:56 AM Vignesh <vignesh.v...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> I am reading about iceberg and am quite new to this.
>>>> This puffin would be an index from key to data file. Other use cases of
>>>> Puffin, such as statistics are at a per file level if I understand
>>>> correctly.
>>>>
>>>> Where would the puffin about key->data file be stored? It is a property
>>>> of the entire table.
>>>>
>>>> Thanks,
>>>> Vignesh.
>>>>
>>>>
>>>> On Sat, Nov 9, 2024 at 2:17 AM Shani Elharrar
>>>> <sh...@upsolver.com.invalid> wrote:
>>>>
>>>>> JB, this is what we do, we write Equality Deletes and periodically
>>>>> convert them to Positional Deletes.
>>>>>
>>>>> We could probably index the keys, maybe partially index using bloom
>>>>> filters, the best would be to put those bloom filters inside puffin.
>>>>>
>>>>> Shani.
>>>>>
>>>>> On 9 Nov 2024, at 11:11, Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
>>>>>
>>>>> 
>>>>> Hi,
>>>>>
>>>>> I agree with Peter here, and I would say that it would be an issue for
>>>>> multi-engine support.
>>>>>
>>>>> I think, as I already mentioned with others, we should explore an
>>>>> alternative.
>>>>> As the main issue is the datafile scan in streaming context, maybe we
>>>>> could find a way to "index"/correlate for positional deletes with limited
>>>>> scanning.
>>>>> I will think again about that :)
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> On Sat, Nov 9, 2024 at 6:48 AM Péter Váry <peter.vary.apa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Imran,
>>>>>>
>>>>>> I don't think it's a good idea to start creating multiple types of
>>>>>> Iceberg tables. Iceberg's main selling point is compatibility between
>>>>>> engines. If we don't have readers and writers for all types of tables, 
>>>>>> then
>>>>>> we remove compatibility from the equation and engine specific formats
>>>>>> always win. OTOH, if we write readers and writers for all types of tables
>>>>>> then we are back on square one.
>>>>>>
>>>>>> Identifier fields are a table schema concept and used in many cases
>>>>>> during query planning and execution. This is why they are defined as part
>>>>>> of the SQL spec, and this is why Iceberg defines them as well. One use 
>>>>>> case
>>>>>> is where they can be used to merge deletes (independently of how they are
>>>>>> manifested) and subsequent inserts, into updates.
>>>>>>
>>>>>> Flink SQL doesn't allow creating tables with partition transforms, so
>>>>>> no new table could be created by Flink SQL using transforms, but tables
>>>>>> created by other engines could still be used (both read an write). Also 
>>>>>> you
>>>>>> can create such tables in Flink using the Java API.
>>>>>>
>>>>>> Requiring partition columns be part of the identifier fields is
>>>>>> coming from the practical consideration, that you want to limit the scope
>>>>>> of the equality deletes as much as possible. Otherwise all of the 
>>>>>> equality
>>>>>> deletes should be table global, and they should be read by every reader. 
>>>>>> We
>>>>>> could write those, we just decided that we don't want to allow the user 
>>>>>> to
>>>>>> do this, as it is most cases a bad idea.
>>>>>>
>>>>>> I hope this helps,
>>>>>> Peter
>>>>>>
>>>>>> On Fri, Nov 8, 2024, 22:01 Imran Rashid <iras...@cloudera.com.invalid>
>>>>>> wrote:
>>>>>>
>>>>>>> I'm not down in the weeds at all myself on implementation details,
>>>>>>> so forgive me if I'm wrong about the details here.
>>>>>>>
>>>>>>> I can see all the viewpoints -- both that equality deletes enable
>>>>>>> some use cases, but also make others far more difficult.  What 
>>>>>>> surprised me
>>>>>>> the most is that Iceberg does not provide a way to distinguish these two
>>>>>>> table "types".
>>>>>>>
>>>>>>> At first, I thought the presence of an identifier-field (
>>>>>>> https://iceberg.apache.org/spec/#identifier-field-ids) indicated
>>>>>>> that the table was a target for equality deletes.  But, then it turns 
>>>>>>> out
>>>>>>> identifier-fields are also useful for changelog views even without 
>>>>>>> equality
>>>>>>> deletes -- IIUC, they show that a delete + insert should actually be
>>>>>>> interpreted as an update in changelog view.
>>>>>>>
>>>>>>> To be perfectly honest, I'm confused about all of these details --
>>>>>>> from my read, the spec does not indicate this relationship between
>>>>>>> identifier-fields and equality_ids in equality delete files (
>>>>>>> https://iceberg.apache.org/spec/#equality-delete-files), but I
>>>>>>> think that is the way Flink works.  Flink itself seems to have even more
>>>>>>> limitations -- no partition transforms are allowed, and all partition
>>>>>>> columns must be a subset of the identifier fields.  Is that just a Flink
>>>>>>> limitation, or is that the intended behavior in the spec?  (Or maybe
>>>>>>> user-error on my part?)  Those seem like very reasonable limitations, 
>>>>>>> from
>>>>>>> an implementation point-of-view.  But OTOH, as a user, this seems to be
>>>>>>> directly contrary to some of the promises of Iceberg.
>>>>>>>
>>>>>>> Its easy to see if a table already has equality deletes in it, by
>>>>>>> looking at the metadata.  But is there any way to indicate that a table 
>>>>>>> (or
>>>>>>> branch of a table) _must not_ have equality deletes added to it?
>>>>>>>
>>>>>>> If that were possible, it seems like we could support both use
>>>>>>> cases.  We could continue to optimize for the streaming ingestion use 
>>>>>>> cases
>>>>>>> using equality deletes.  But we could also build more optimizations into
>>>>>>> the "non-streaming-ingestion" branches.  And we could document the 
>>>>>>> tradeoff
>>>>>>> so it is much clearer to end users.
>>>>>>>
>>>>>>> To maintain compatibility, I suppose that the change would be that
>>>>>>> equality deletes continue to be allowed by default, but we'd add a new
>>>>>>> field to indicate that for some tables (or branches of a table), 
>>>>>>> equality
>>>>>>> deletes would not be allowed.  And it would be an error for an engine to
>>>>>>> make an update which added an equality delete to such a table.
>>>>>>>
>>>>>>> Maybe that change would even be possible in V3.
>>>>>>>
>>>>>>> And if all the performance improvements to equality deletes make
>>>>>>> this a moot point, we could drop the field in v4.  But it seems like a
>>>>>>> mistake to both limit the non-streaming use-case AND have confusing
>>>>>>> limitations for the end-user in the meantime.
>>>>>>>
>>>>>>> I would happily be corrected about my understanding of all of the
>>>>>>> above.
>>>>>>>
>>>>>>> thanks!
>>>>>>> Imran
>>>>>>>
>>>>>>> On Tue, Nov 5, 2024 at 9:16 AM Bryan Keller <brya...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I also feel we should keep equality deletes until we have an
>>>>>>>> alternative solution for streaming updates/deletes.
>>>>>>>>
>>>>>>>> -Bryan
>>>>>>>>
>>>>>>>> On Nov 4, 2024, at 8:33 AM, Péter Váry <peter.vary.apa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Well, it seems like I'm a little late, so most of the arguments are
>>>>>>>> voiced.
>>>>>>>>
>>>>>>>> I agree that we should not deprecate the equality deletes until we
>>>>>>>> have a replacement feature.
>>>>>>>> I think one of the big advantages of Iceberg is that it supports
>>>>>>>> batch processing and streaming ingestion too.
>>>>>>>> For streaming ingestion we need a way to update existing data in a
>>>>>>>> performant way, but restricting deletes for the primary keys seems like
>>>>>>>> enough from the streaming perspective.
>>>>>>>>
>>>>>>>> Equality deletes allow a very wide range of applications, which we
>>>>>>>> might be able to narrow down a bit, but still keep useful. So if we 
>>>>>>>> want to
>>>>>>>> go down this road, we need to start collecting the requirements.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Peter
>>>>>>>>
>>>>>>>> Shani Elharrar <sh...@upsolver.com.invalid> ezt írta (időpont:
>>>>>>>> 2024. nov. 1., P, 19:22):
>>>>>>>>
>>>>>>>>> I understand how it makes sense for batch jobs, but it damages
>>>>>>>>> stream jobs, using equality deletes works much better for streaming 
>>>>>>>>> (which
>>>>>>>>> have a strict SLA for delays), and in order to decrease the 
>>>>>>>>> performance
>>>>>>>>> penalty - systems can rewrite the equality deletes to positional 
>>>>>>>>> deletes.
>>>>>>>>>
>>>>>>>>> Shani.
>>>>>>>>>
>>>>>>>>> On 1 Nov 2024, at 20:06, Steven Wu <stevenz...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> 
>>>>>>>>> Fundamentally, it is very difficult to write position deletes with
>>>>>>>>> concurrent writers and conflicts for batch jobs too, as the inverted 
>>>>>>>>> index
>>>>>>>>> may become invalid/stale.
>>>>>>>>>
>>>>>>>>> The position deletes are created during the write phase. But
>>>>>>>>> conflicts are only detected at the commit stage. I assume the batch 
>>>>>>>>> job
>>>>>>>>> should fail in this case.
>>>>>>>>>
>>>>>>>>> On Fri, Nov 1, 2024 at 10:57 AM Steven Wu <stevenz...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Shani,
>>>>>>>>>>
>>>>>>>>>> That is a good point. It is certainly a limitation for the Flink
>>>>>>>>>> job to track the inverted index internally (which is what I had in 
>>>>>>>>>> mind).
>>>>>>>>>> It can't be shared/synchronized with other Flink jobs or other 
>>>>>>>>>> engines
>>>>>>>>>> writing to the same table.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Steven
>>>>>>>>>>
>>>>>>>>>> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar
>>>>>>>>>> <sh...@upsolver.com.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> Even if Flink can create this state, it would have to be
>>>>>>>>>>> maintained against the Iceberg table, we wouldn't like duplicates 
>>>>>>>>>>> (keys) if
>>>>>>>>>>> other systems / users update the table (e.g manual insert / updates 
>>>>>>>>>>> using
>>>>>>>>>>> DML).
>>>>>>>>>>>
>>>>>>>>>>> Shani.
>>>>>>>>>>>
>>>>>>>>>>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> 
>>>>>>>>>>> > Add support for inverted indexes to reduce the cost of
>>>>>>>>>>> position lookup. This is fairly tricky to implement for streaming 
>>>>>>>>>>> use cases
>>>>>>>>>>> without an external system.
>>>>>>>>>>>
>>>>>>>>>>> Anton, that is also what I was saying earlier. In Flink, the
>>>>>>>>>>> inverted index of (key, committed data files) can be tracked in 
>>>>>>>>>>> Flink state.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi <
>>>>>>>>>>> aokolnyc...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I was a bit skeptical when we were adding equality deletes, but
>>>>>>>>>>>> nothing beats their performance during writes. We have to find an
>>>>>>>>>>>> alternative before deprecating.
>>>>>>>>>>>>
>>>>>>>>>>>> We are doing a lot of work to improve streaming, like reducing
>>>>>>>>>>>> the cost of commits, enabling a large (potentially infinite) 
>>>>>>>>>>>> number of
>>>>>>>>>>>> snapshots, changelog reads, and so on. It is a project goal to 
>>>>>>>>>>>> excel in
>>>>>>>>>>>> streaming.
>>>>>>>>>>>>
>>>>>>>>>>>> I was going to focus on equality deletes after completing the
>>>>>>>>>>>> DV work. I believe we have these options:
>>>>>>>>>>>>
>>>>>>>>>>>> - Revisit the existing design of equality deletes (e.g. add
>>>>>>>>>>>> more restrictions, improve compaction, offer new writers).
>>>>>>>>>>>> - Standardize on the view-based approach [1] to handle
>>>>>>>>>>>> streaming upserts and CDC use cases, potentially making this part 
>>>>>>>>>>>> of the
>>>>>>>>>>>> spec.
>>>>>>>>>>>> - Add support for inverted indexes to reduce the cost of
>>>>>>>>>>>> position lookup. This is fairly tricky to implement for streaming 
>>>>>>>>>>>> use cases
>>>>>>>>>>>> without an external system. Our runtime filtering in Spark today is
>>>>>>>>>>>> equivalent to looking up positions in an inverted index 
>>>>>>>>>>>> represented by
>>>>>>>>>>>> another Iceberg table. That may still not be enough for some 
>>>>>>>>>>>> streaming use
>>>>>>>>>>>> cases.
>>>>>>>>>>>>
>>>>>>>>>>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/
>>>>>>>>>>>>
>>>>>>>>>>>> - Anton
>>>>>>>>>>>>
>>>>>>>>>>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield <
>>>>>>>>>>>> emkornfi...@gmail.com> пише:
>>>>>>>>>>>>
>>>>>>>>>>>>> I agree that equality deletes have their place in streaming.
>>>>>>>>>>>>> I think the ultimate decision here is how opinionated Iceberg 
>>>>>>>>>>>>> wants to be
>>>>>>>>>>>>> on its use-cases.  If it really wants to stick to its origins of 
>>>>>>>>>>>>> "slow
>>>>>>>>>>>>> moving data", then removing equality deletes would be inline with 
>>>>>>>>>>>>> this.  I
>>>>>>>>>>>>> think the other high level question is how much we allow for 
>>>>>>>>>>>>> partially
>>>>>>>>>>>>> compatible features (the row lineage use-case feature was 
>>>>>>>>>>>>> explicitly
>>>>>>>>>>>>> approved excluding equality deletes, and people seemed OK with it 
>>>>>>>>>>>>> at the
>>>>>>>>>>>>> time.  If all features need to work together, then maybe we need 
>>>>>>>>>>>>> to rethink
>>>>>>>>>>>>> the design here so it can be forward compatible with equality 
>>>>>>>>>>>>> deletes).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think one issue with equality deletes as stated in the spec
>>>>>>>>>>>>> is that they are overly broad.  I'd be interested if people have 
>>>>>>>>>>>>> any use
>>>>>>>>>>>>> cases that differ, but I think one way of narrowing (and probably 
>>>>>>>>>>>>> a
>>>>>>>>>>>>> necessary building block for building something better)  the 
>>>>>>>>>>>>> specification
>>>>>>>>>>>>> scope on equality deletes is to focus on upsert/Streaming 
>>>>>>>>>>>>> deletes.  Two
>>>>>>>>>>>>> proposals in this regard are:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1.  Require that equality deletes can only correspond to
>>>>>>>>>>>>> unique identifiers for the table.
>>>>>>>>>>>>> 2.  Consider requiring that for equality deletes on
>>>>>>>>>>>>> partitioned tables, that the primary key must contain a partition 
>>>>>>>>>>>>> column (I
>>>>>>>>>>>>> believe Flink at least already does this).  It is less clear to 
>>>>>>>>>>>>> me that
>>>>>>>>>>>>> this would meet all existing use-cases.  But having this would 
>>>>>>>>>>>>> allow for
>>>>>>>>>>>>> better incremental data-structures, which could then be partition 
>>>>>>>>>>>>> based.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Narrow scope to unique identifiers would allow for further
>>>>>>>>>>>>> building blocks already mentioned, like a secondary index 
>>>>>>>>>>>>> (possible via LSM
>>>>>>>>>>>>> tree), that would allow for better performance overall.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I generally agree with the sentiment that we shouldn't
>>>>>>>>>>>>> deprecate them until there is a viable replacement.  With all due 
>>>>>>>>>>>>> respect
>>>>>>>>>>>>> to my employer, let's not fall into the Google trap [1] :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Micah
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] https://goomics.net/50/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo <
>>>>>>>>>>>>> alex...@starburstdata.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Just to throw my 2 cents in, I agree with Steven and others
>>>>>>>>>>>>>> that we do need some kind of replacement before deprecating 
>>>>>>>>>>>>>> equality
>>>>>>>>>>>>>> deletes.
>>>>>>>>>>>>>> They certainly have their problems, and do significantly
>>>>>>>>>>>>>> increase complexity as they are now, but the writing of position 
>>>>>>>>>>>>>> deletes is
>>>>>>>>>>>>>> too expensive for certain pipelines.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We've been investigating using equality deletes for some of
>>>>>>>>>>>>>> our workloads at Starburst, the key advantage we were hoping to 
>>>>>>>>>>>>>> leverage is
>>>>>>>>>>>>>> cheap, effectively random access lookup deletes.
>>>>>>>>>>>>>> Say you have a UUID column that's unique in a table and want
>>>>>>>>>>>>>> to delete a row by UUID. With position deletes each delete is 
>>>>>>>>>>>>>> expensive
>>>>>>>>>>>>>> without an index on that UUID.
>>>>>>>>>>>>>> With equality deletes each delete is cheap and while
>>>>>>>>>>>>>> reads/compaction is expensive but when updates are frequent and 
>>>>>>>>>>>>>> reads are
>>>>>>>>>>>>>> sporadic that's a reasonable tradeoff.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Pretty much what Jason and Steven have already said.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe there are some incremental improvements on equality
>>>>>>>>>>>>>> deletes or tips from similar systems that might alleviate some 
>>>>>>>>>>>>>> of their
>>>>>>>>>>>>>> problems?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - Alex Jo
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu <
>>>>>>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We probably all agree with the downside of equality deletes:
>>>>>>>>>>>>>>> it postpones all the work on the read path.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In theory, we can implement position deletes only in the
>>>>>>>>>>>>>>> Flink streaming writer. It would require the tracking of last 
>>>>>>>>>>>>>>> committed
>>>>>>>>>>>>>>> data files per key, which can be stored in Flink state 
>>>>>>>>>>>>>>> (checkpointed). This
>>>>>>>>>>>>>>> is obviously quite expensive/challenging, but possible.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I like to echo one benefit of equality deletes that Russel
>>>>>>>>>>>>>>> called out in the original email. Equality deletes would never
>>>>>>>>>>>>>>> have conflicts. that is important for streaming writers (Flink, 
>>>>>>>>>>>>>>> Kafka
>>>>>>>>>>>>>>> connect, ...) that commit frequently (minutes or less). Assume 
>>>>>>>>>>>>>>> Flink can
>>>>>>>>>>>>>>> write position deletes only and commit every 2 minutes. The 
>>>>>>>>>>>>>>> long-running
>>>>>>>>>>>>>>> nature of streaming jobs can cause frequent commit conflicts 
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>> background delete compaction jobs.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Overall, the streaming upsert write is not a well solved
>>>>>>>>>>>>>>> problem in Iceberg. This probably affects all streaming engines 
>>>>>>>>>>>>>>> (Flink,
>>>>>>>>>>>>>>> Kafka connect, Spark streaming, ...). We need to come up with 
>>>>>>>>>>>>>>> some better
>>>>>>>>>>>>>>> alternatives before we can deprecate equality deletes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer <
>>>>>>>>>>>>>>> russell.spit...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For users of Equality Deletes, what are the key benefits to
>>>>>>>>>>>>>>>> Equality Deletes that you would like to preserve and could you 
>>>>>>>>>>>>>>>> please share
>>>>>>>>>>>>>>>> some concrete examples of the queries you want to run (and the 
>>>>>>>>>>>>>>>> schemas and
>>>>>>>>>>>>>>>> data sizes you would like to run them against) and the 
>>>>>>>>>>>>>>>> latencies that would
>>>>>>>>>>>>>>>> be acceptable?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine
>>>>>>>>>>>>>>>> <ja...@upsolver.com.invalid> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Representing Upsolver here, we also make use of Equality
>>>>>>>>>>>>>>>>> Deletes to deliver high frequency low latency updates to our 
>>>>>>>>>>>>>>>>> clients at
>>>>>>>>>>>>>>>>> scale. We have customers using them at scale and 
>>>>>>>>>>>>>>>>> demonstrating the need and
>>>>>>>>>>>>>>>>> viability. We automate the process of converting them into 
>>>>>>>>>>>>>>>>> positional
>>>>>>>>>>>>>>>>> deletes (or fully applying them) for more efficient engine 
>>>>>>>>>>>>>>>>> queries in the
>>>>>>>>>>>>>>>>> background giving our users both low latency and good query 
>>>>>>>>>>>>>>>>> performance.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Equality Deletes were added since there isn't a good way
>>>>>>>>>>>>>>>>> to solve frequent updates otherwise. It would require some 
>>>>>>>>>>>>>>>>> sort of index
>>>>>>>>>>>>>>>>> keeping track of every record in the table (by a 
>>>>>>>>>>>>>>>>> predetermined PK) and
>>>>>>>>>>>>>>>>> maintaining such an index is a huge task that every tool 
>>>>>>>>>>>>>>>>> interested in this
>>>>>>>>>>>>>>>>> would need to re-implement. It also becomes a bottleneck 
>>>>>>>>>>>>>>>>> limiting table
>>>>>>>>>>>>>>>>> sizes.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I don't think they should be removed without providing an
>>>>>>>>>>>>>>>>> alternative. Positional Deletes have a different performance 
>>>>>>>>>>>>>>>>> profile
>>>>>>>>>>>>>>>>> inherently, requiring more upfront work proportional to the 
>>>>>>>>>>>>>>>>> table size.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré <
>>>>>>>>>>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Russell
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for the nice writeup and the proposal.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I agree with your analysis, and I have the same feeling.
>>>>>>>>>>>>>>>>>> However, I
>>>>>>>>>>>>>>>>>> think there are more than Flink that write equality
>>>>>>>>>>>>>>>>>> delete files. So,
>>>>>>>>>>>>>>>>>> I agree to deprecate in V3, but maybe be more "flexible"
>>>>>>>>>>>>>>>>>> about removal
>>>>>>>>>>>>>>>>>> in V4 in order to give time to engines to update.
>>>>>>>>>>>>>>>>>> I think that by deprecating equality deletes, we are
>>>>>>>>>>>>>>>>>> clearly focusing
>>>>>>>>>>>>>>>>>> on read performance and "consistency" (more than write).
>>>>>>>>>>>>>>>>>> It's not
>>>>>>>>>>>>>>>>>> necessarily a bad thing but the streaming platform and
>>>>>>>>>>>>>>>>>> data ingestion
>>>>>>>>>>>>>>>>>> platforms will be probably concerned about that (by using
>>>>>>>>>>>>>>>>>> positional
>>>>>>>>>>>>>>>>>> deletes, they will have to scan/read all datafiles to
>>>>>>>>>>>>>>>>>> find the
>>>>>>>>>>>>>>>>>> position, so painful).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So, to summarize:
>>>>>>>>>>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to commit
>>>>>>>>>>>>>>>>>> any target
>>>>>>>>>>>>>>>>>> for deletion before having a clear path for streaming
>>>>>>>>>>>>>>>>>> platforms
>>>>>>>>>>>>>>>>>> (Flink, Beam, ...)
>>>>>>>>>>>>>>>>>> 2. In the meantime (during the deprecation period), I
>>>>>>>>>>>>>>>>>> propose to
>>>>>>>>>>>>>>>>>> explore possible improvements for streaming platforms
>>>>>>>>>>>>>>>>>> (maybe finding a
>>>>>>>>>>>>>>>>>> way to avoid full data files scan, ...)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks !
>>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer
>>>>>>>>>>>>>>>>>> <russell.spit...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Background:
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > 1) Position Deletes
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Writers determine what rows are deleted and mark them
>>>>>>>>>>>>>>>>>> in a 1 for 1 representation. With delete vectors this means 
>>>>>>>>>>>>>>>>>> every data file
>>>>>>>>>>>>>>>>>> has at most 1 delete vector that it is read in conjunction 
>>>>>>>>>>>>>>>>>> with to excise
>>>>>>>>>>>>>>>>>> deleted rows. Reader overhead is more or less constant and 
>>>>>>>>>>>>>>>>>> is very
>>>>>>>>>>>>>>>>>> predictable.
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > The main cost of this mode is that deletes must be
>>>>>>>>>>>>>>>>>> determined at write time which is expensive and can be more 
>>>>>>>>>>>>>>>>>> difficult for
>>>>>>>>>>>>>>>>>> conflict resolution
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > 2) Equality Deletes
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Writers write out reference to what values are deleted
>>>>>>>>>>>>>>>>>> (in a partition or globally). There can be an unlimited 
>>>>>>>>>>>>>>>>>> number of equality
>>>>>>>>>>>>>>>>>> deletes and they all must be checked for every data file 
>>>>>>>>>>>>>>>>>> that is read. The
>>>>>>>>>>>>>>>>>> cost of determining deleted rows is essentially given to the 
>>>>>>>>>>>>>>>>>> reader.
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Conflicts almost never happen since data files are not
>>>>>>>>>>>>>>>>>> actually changed and there is almost no cost to the writer 
>>>>>>>>>>>>>>>>>> to generate
>>>>>>>>>>>>>>>>>> these. Almost all costs related to equality deletes are 
>>>>>>>>>>>>>>>>>> passed on to the
>>>>>>>>>>>>>>>>>> reader.
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Proposal:
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Equality deletes are, in my opinion, unsustainable and
>>>>>>>>>>>>>>>>>> we should work on deprecating and removing them from the 
>>>>>>>>>>>>>>>>>> specification. At
>>>>>>>>>>>>>>>>>> this time, I know of only one engine (Apache Flink) which 
>>>>>>>>>>>>>>>>>> produces these
>>>>>>>>>>>>>>>>>> deletes but almost all engines have implementations to read 
>>>>>>>>>>>>>>>>>> them. The cost
>>>>>>>>>>>>>>>>>> of implementing equality deletes on the read path is 
>>>>>>>>>>>>>>>>>> difficult and
>>>>>>>>>>>>>>>>>> unpredictable in terms of memory usage and compute 
>>>>>>>>>>>>>>>>>> complexity. We’ve had
>>>>>>>>>>>>>>>>>> suggestions of implementing rocksdb inorder to handle ever 
>>>>>>>>>>>>>>>>>> growing sets of
>>>>>>>>>>>>>>>>>> equality deletes which in my opinion shows that we are going 
>>>>>>>>>>>>>>>>>> down the wrong
>>>>>>>>>>>>>>>>>> path.
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Outside of performance, Equality deletes are also
>>>>>>>>>>>>>>>>>> difficult to use in conjunction with many other features. 
>>>>>>>>>>>>>>>>>> For example, any
>>>>>>>>>>>>>>>>>> features requiring CDC or Row lineage are basically 
>>>>>>>>>>>>>>>>>> impossible when
>>>>>>>>>>>>>>>>>> equality deletes are in use. When Equality deletes are 
>>>>>>>>>>>>>>>>>> present, the state
>>>>>>>>>>>>>>>>>> of the table can only be determined with a full scan making 
>>>>>>>>>>>>>>>>>> it difficult to
>>>>>>>>>>>>>>>>>> update differential structures. This means materialized 
>>>>>>>>>>>>>>>>>> views or indexes
>>>>>>>>>>>>>>>>>> need to essentially be fully rebuilt whenever an equality 
>>>>>>>>>>>>>>>>>> delete is added
>>>>>>>>>>>>>>>>>> to the table.
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Equality deletes essentially remove complexity from the
>>>>>>>>>>>>>>>>>> write side but then add what I believe is an unacceptable 
>>>>>>>>>>>>>>>>>> level of
>>>>>>>>>>>>>>>>>> complexity to the read side.
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Because of this I suggest we deprecate Equality Deletes
>>>>>>>>>>>>>>>>>> in V3 and slate them for full removal from the Iceberg Spec 
>>>>>>>>>>>>>>>>>> in V4.
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > I know this is a big change and compatibility breakage
>>>>>>>>>>>>>>>>>> so I would like to introduce this idea to the community and 
>>>>>>>>>>>>>>>>>> solicit
>>>>>>>>>>>>>>>>>> feedback from all stakeholders. I am very flexible on this 
>>>>>>>>>>>>>>>>>> issue and would
>>>>>>>>>>>>>>>>>> like to hear the best issues both for and against removal of 
>>>>>>>>>>>>>>>>>> Equality
>>>>>>>>>>>>>>>>>> Deletes.
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Thanks everyone for your time,
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Russ Spitzer
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *Jason Fine*
>>>>>>>>>>>>>>>>> Chief Software Architect
>>>>>>>>>>>>>>>>> ja...@upsolver.com  | www.upsolver.com
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>

Reply via email to