Hi Ajantha,

I'm proposing exploring a view-based approach similar to the
changelog-mirror table pattern[1] rather than supporting delta writers for
Kafka connect Iceberg sink.

1.
https://www.tabular.io/apache-iceberg-cookbook/data-engineering-cdc-table-mirroring/

On Tue, Nov 19, 2024 at 7:38 PM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> I don’t think it’s a problem while an alternative is explored (the JDK
> itself does that pretty often).
> So it’s up to the community: of course I’m against removing it without
> solid alternative, but deprecation is fine imho.
>
> Regards
> JB
>
> Le mar. 19 nov. 2024 à 12:19, Ajantha Bhat <ajanthab...@gmail.com> a
> écrit :
>
>> - ok for deprecate equality deletes
>>> - not ok to remove it
>>
>>
>> @JB: I don't think it is a good idea to use deprecated functionality in
>> the new feature development.
>> Hence, my specific question was about kafka connect upsert operation.
>>
>> @Manu: I meant the delta writers for kafka connect Iceberg sink (which in
>> turn used for upsetting the CDC records)
>> https://github.com/apache/iceberg/issues/10842
>>
>>
>> - Ajantha
>>
>>
>>
>> On Tue, Nov 19, 2024 at 3:08 PM Manu Zhang <owenzhang1...@gmail.com>
>> wrote:
>>
>>> I second Anton's proposal to standardize on a view-based approach to
>>> handle CDC cases.
>>> Actually, it's already been explored in detail[1] by Jack before.
>>>
>>> [1] Improving Change Data Capture Use Case for Apache Iceberg
>>> <https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit?tab=t.0#heading=h.94xnx4qg3bnt>
>>>
>>>
>>> On Tue, Nov 19, 2024 at 4:16 PM Jean-Baptiste Onofré <j...@nanthrax.net>
>>> wrote:
>>>
>>>> My proposal is the following (already expressed):
>>>> - ok for deprecate equality deletes
>>>> - not ok to remove it
>>>> - work on position deletes improvements to address streaming use cases.
>>>> I think we should explore different approaches. Personally I think a
>>>> possible approach would be to find index way to data files to avoid full
>>>> scan to find row position.
>>>>
>>>> My $0.01 :)
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> Le mar. 19 nov. 2024 à 07:53, Ajantha Bhat <ajanthab...@gmail.com> a
>>>> écrit :
>>>>
>>>>> Hi, What's the conclusion on this thread?
>>>>>
>>>>> Users are looking for Upsert (CDC) support for OSS Iceberg
>>>>> kafka connect sink.
>>>>> We only support appends at the moment. Can we go ahead and implement
>>>>> the upserts using equality deletes?
>>>>>
>>>>>
>>>>> - Ajantha
>>>>>
>>>>> On Sun, Nov 10, 2024 at 11:56 AM Vignesh <vignesh.v...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>> I am reading about iceberg and am quite new to this.
>>>>>> This puffin would be an index from key to data file. Other use cases
>>>>>> of Puffin, such as statistics are at a per file level if I understand
>>>>>> correctly.
>>>>>>
>>>>>> Where would the puffin about key->data file be stored? It is a
>>>>>> property of the entire table.
>>>>>>
>>>>>> Thanks,
>>>>>> Vignesh.
>>>>>>
>>>>>>
>>>>>> On Sat, Nov 9, 2024 at 2:17 AM Shani Elharrar
>>>>>> <sh...@upsolver.com.invalid> wrote:
>>>>>>
>>>>>>> JB, this is what we do, we write Equality Deletes and periodically
>>>>>>> convert them to Positional Deletes.
>>>>>>>
>>>>>>> We could probably index the keys, maybe partially index using bloom
>>>>>>> filters, the best would be to put those bloom filters inside puffin.
>>>>>>>
>>>>>>> Shani.
>>>>>>>
>>>>>>> On 9 Nov 2024, at 11:11, Jean-Baptiste Onofré <j...@nanthrax.net>
>>>>>>> wrote:
>>>>>>>
>>>>>>> 
>>>>>>> Hi,
>>>>>>>
>>>>>>> I agree with Peter here, and I would say that it would be an issue
>>>>>>> for multi-engine support.
>>>>>>>
>>>>>>> I think, as I already mentioned with others, we should explore an
>>>>>>> alternative.
>>>>>>> As the main issue is the datafile scan in streaming context, maybe
>>>>>>> we could find a way to "index"/correlate for positional deletes with
>>>>>>> limited scanning.
>>>>>>> I will think again about that :)
>>>>>>>
>>>>>>> Regards
>>>>>>> JB
>>>>>>>
>>>>>>> On Sat, Nov 9, 2024 at 6:48 AM Péter Váry <
>>>>>>> peter.vary.apa...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Imran,
>>>>>>>>
>>>>>>>> I don't think it's a good idea to start creating multiple types of
>>>>>>>> Iceberg tables. Iceberg's main selling point is compatibility between
>>>>>>>> engines. If we don't have readers and writers for all types of tables, 
>>>>>>>> then
>>>>>>>> we remove compatibility from the equation and engine specific formats
>>>>>>>> always win. OTOH, if we write readers and writers for all types of 
>>>>>>>> tables
>>>>>>>> then we are back on square one.
>>>>>>>>
>>>>>>>> Identifier fields are a table schema concept and used in many cases
>>>>>>>> during query planning and execution. This is why they are defined as 
>>>>>>>> part
>>>>>>>> of the SQL spec, and this is why Iceberg defines them as well. One use 
>>>>>>>> case
>>>>>>>> is where they can be used to merge deletes (independently of how they 
>>>>>>>> are
>>>>>>>> manifested) and subsequent inserts, into updates.
>>>>>>>>
>>>>>>>> Flink SQL doesn't allow creating tables with partition transforms,
>>>>>>>> so no new table could be created by Flink SQL using transforms, but 
>>>>>>>> tables
>>>>>>>> created by other engines could still be used (both read an write). 
>>>>>>>> Also you
>>>>>>>> can create such tables in Flink using the Java API.
>>>>>>>>
>>>>>>>> Requiring partition columns be part of the identifier fields is
>>>>>>>> coming from the practical consideration, that you want to limit the 
>>>>>>>> scope
>>>>>>>> of the equality deletes as much as possible. Otherwise all of the 
>>>>>>>> equality
>>>>>>>> deletes should be table global, and they should be read by every 
>>>>>>>> reader. We
>>>>>>>> could write those, we just decided that we don't want to allow the 
>>>>>>>> user to
>>>>>>>> do this, as it is most cases a bad idea.
>>>>>>>>
>>>>>>>> I hope this helps,
>>>>>>>> Peter
>>>>>>>>
>>>>>>>> On Fri, Nov 8, 2024, 22:01 Imran Rashid
>>>>>>>> <iras...@cloudera.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> I'm not down in the weeds at all myself on implementation details,
>>>>>>>>> so forgive me if I'm wrong about the details here.
>>>>>>>>>
>>>>>>>>> I can see all the viewpoints -- both that equality deletes enable
>>>>>>>>> some use cases, but also make others far more difficult.  What 
>>>>>>>>> surprised me
>>>>>>>>> the most is that Iceberg does not provide a way to distinguish these 
>>>>>>>>> two
>>>>>>>>> table "types".
>>>>>>>>>
>>>>>>>>> At first, I thought the presence of an identifier-field (
>>>>>>>>> https://iceberg.apache.org/spec/#identifier-field-ids) indicated
>>>>>>>>> that the table was a target for equality deletes.  But, then it turns 
>>>>>>>>> out
>>>>>>>>> identifier-fields are also useful for changelog views even without 
>>>>>>>>> equality
>>>>>>>>> deletes -- IIUC, they show that a delete + insert should actually be
>>>>>>>>> interpreted as an update in changelog view.
>>>>>>>>>
>>>>>>>>> To be perfectly honest, I'm confused about all of these details --
>>>>>>>>> from my read, the spec does not indicate this relationship between
>>>>>>>>> identifier-fields and equality_ids in equality delete files (
>>>>>>>>> https://iceberg.apache.org/spec/#equality-delete-files), but I
>>>>>>>>> think that is the way Flink works.  Flink itself seems to have even 
>>>>>>>>> more
>>>>>>>>> limitations -- no partition transforms are allowed, and all partition
>>>>>>>>> columns must be a subset of the identifier fields.  Is that just a 
>>>>>>>>> Flink
>>>>>>>>> limitation, or is that the intended behavior in the spec?  (Or maybe
>>>>>>>>> user-error on my part?)  Those seem like very reasonable limitations, 
>>>>>>>>> from
>>>>>>>>> an implementation point-of-view.  But OTOH, as a user, this seems to 
>>>>>>>>> be
>>>>>>>>> directly contrary to some of the promises of Iceberg.
>>>>>>>>>
>>>>>>>>> Its easy to see if a table already has equality deletes in it, by
>>>>>>>>> looking at the metadata.  But is there any way to indicate that a 
>>>>>>>>> table (or
>>>>>>>>> branch of a table) _must not_ have equality deletes added to it?
>>>>>>>>>
>>>>>>>>> If that were possible, it seems like we could support both use
>>>>>>>>> cases.  We could continue to optimize for the streaming ingestion use 
>>>>>>>>> cases
>>>>>>>>> using equality deletes.  But we could also build more optimizations 
>>>>>>>>> into
>>>>>>>>> the "non-streaming-ingestion" branches.  And we could document the 
>>>>>>>>> tradeoff
>>>>>>>>> so it is much clearer to end users.
>>>>>>>>>
>>>>>>>>> To maintain compatibility, I suppose that the change would be that
>>>>>>>>> equality deletes continue to be allowed by default, but we'd add a new
>>>>>>>>> field to indicate that for some tables (or branches of a table), 
>>>>>>>>> equality
>>>>>>>>> deletes would not be allowed.  And it would be an error for an engine 
>>>>>>>>> to
>>>>>>>>> make an update which added an equality delete to such a table.
>>>>>>>>>
>>>>>>>>> Maybe that change would even be possible in V3.
>>>>>>>>>
>>>>>>>>> And if all the performance improvements to equality deletes make
>>>>>>>>> this a moot point, we could drop the field in v4.  But it seems like a
>>>>>>>>> mistake to both limit the non-streaming use-case AND have confusing
>>>>>>>>> limitations for the end-user in the meantime.
>>>>>>>>>
>>>>>>>>> I would happily be corrected about my understanding of all of the
>>>>>>>>> above.
>>>>>>>>>
>>>>>>>>> thanks!
>>>>>>>>> Imran
>>>>>>>>>
>>>>>>>>> On Tue, Nov 5, 2024 at 9:16 AM Bryan Keller <brya...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I also feel we should keep equality deletes until we have an
>>>>>>>>>> alternative solution for streaming updates/deletes.
>>>>>>>>>>
>>>>>>>>>> -Bryan
>>>>>>>>>>
>>>>>>>>>> On Nov 4, 2024, at 8:33 AM, Péter Váry <
>>>>>>>>>> peter.vary.apa...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Well, it seems like I'm a little late, so most of the arguments
>>>>>>>>>> are voiced.
>>>>>>>>>>
>>>>>>>>>> I agree that we should not deprecate the equality deletes until
>>>>>>>>>> we have a replacement feature.
>>>>>>>>>> I think one of the big advantages of Iceberg is that it supports
>>>>>>>>>> batch processing and streaming ingestion too.
>>>>>>>>>> For streaming ingestion we need a way to update existing data in
>>>>>>>>>> a performant way, but restricting deletes for the primary keys seems 
>>>>>>>>>> like
>>>>>>>>>> enough from the streaming perspective.
>>>>>>>>>>
>>>>>>>>>> Equality deletes allow a very wide range of applications, which
>>>>>>>>>> we might be able to narrow down a bit, but still keep useful. So if 
>>>>>>>>>> we want
>>>>>>>>>> to go down this road, we need to start collecting the requirements.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>> Shani Elharrar <sh...@upsolver.com.invalid> ezt írta (időpont:
>>>>>>>>>> 2024. nov. 1., P, 19:22):
>>>>>>>>>>
>>>>>>>>>>> I understand how it makes sense for batch jobs, but it damages
>>>>>>>>>>> stream jobs, using equality deletes works much better for streaming 
>>>>>>>>>>> (which
>>>>>>>>>>> have a strict SLA for delays), and in order to decrease the 
>>>>>>>>>>> performance
>>>>>>>>>>> penalty - systems can rewrite the equality deletes to positional 
>>>>>>>>>>> deletes.
>>>>>>>>>>>
>>>>>>>>>>> Shani.
>>>>>>>>>>>
>>>>>>>>>>> On 1 Nov 2024, at 20:06, Steven Wu <stevenz...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> 
>>>>>>>>>>> Fundamentally, it is very difficult to write position deletes
>>>>>>>>>>> with concurrent writers and conflicts for batch jobs too, as the 
>>>>>>>>>>> inverted
>>>>>>>>>>> index may become invalid/stale.
>>>>>>>>>>>
>>>>>>>>>>> The position deletes are created during the write phase. But
>>>>>>>>>>> conflicts are only detected at the commit stage. I assume the batch 
>>>>>>>>>>> job
>>>>>>>>>>> should fail in this case.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Nov 1, 2024 at 10:57 AM Steven Wu <stevenz...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Shani,
>>>>>>>>>>>>
>>>>>>>>>>>> That is a good point. It is certainly a limitation for the
>>>>>>>>>>>> Flink job to track the inverted index internally (which is what I 
>>>>>>>>>>>> had in
>>>>>>>>>>>> mind). It can't be shared/synchronized with other Flink jobs or 
>>>>>>>>>>>> other
>>>>>>>>>>>> engines writing to the same table.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Steven
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar
>>>>>>>>>>>> <sh...@upsolver.com.invalid> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Even if Flink can create this state, it would have to be
>>>>>>>>>>>>> maintained against the Iceberg table, we wouldn't like duplicates 
>>>>>>>>>>>>> (keys) if
>>>>>>>>>>>>> other systems / users update the table (e.g manual insert / 
>>>>>>>>>>>>> updates using
>>>>>>>>>>>>> DML).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Shani.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> > Add support for inverted indexes to reduce the cost of
>>>>>>>>>>>>> position lookup. This is fairly tricky to implement for streaming 
>>>>>>>>>>>>> use cases
>>>>>>>>>>>>> without an external system.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anton, that is also what I was saying earlier. In Flink, the
>>>>>>>>>>>>> inverted index of (key, committed data files) can be tracked in 
>>>>>>>>>>>>> Flink state.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi <
>>>>>>>>>>>>> aokolnyc...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was a bit skeptical when we were adding equality deletes,
>>>>>>>>>>>>>> but nothing beats their performance during writes. We have to 
>>>>>>>>>>>>>> find an
>>>>>>>>>>>>>> alternative before deprecating.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We are doing a lot of work to improve streaming, like
>>>>>>>>>>>>>> reducing the cost of commits, enabling a large (potentially 
>>>>>>>>>>>>>> infinite)
>>>>>>>>>>>>>> number of snapshots, changelog reads, and so on. It is a project 
>>>>>>>>>>>>>> goal to
>>>>>>>>>>>>>> excel in streaming.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was going to focus on equality deletes after completing the
>>>>>>>>>>>>>> DV work. I believe we have these options:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - Revisit the existing design of equality deletes (e.g. add
>>>>>>>>>>>>>> more restrictions, improve compaction, offer new writers).
>>>>>>>>>>>>>> - Standardize on the view-based approach [1] to handle
>>>>>>>>>>>>>> streaming upserts and CDC use cases, potentially making this 
>>>>>>>>>>>>>> part of the
>>>>>>>>>>>>>> spec.
>>>>>>>>>>>>>> - Add support for inverted indexes to reduce the cost of
>>>>>>>>>>>>>> position lookup. This is fairly tricky to implement for 
>>>>>>>>>>>>>> streaming use cases
>>>>>>>>>>>>>> without an external system. Our runtime filtering in Spark today 
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> equivalent to looking up positions in an inverted index 
>>>>>>>>>>>>>> represented by
>>>>>>>>>>>>>> another Iceberg table. That may still not be enough for some 
>>>>>>>>>>>>>> streaming use
>>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - Anton
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield <
>>>>>>>>>>>>>> emkornfi...@gmail.com> пише:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I agree that equality deletes have their place in
>>>>>>>>>>>>>>> streaming.  I think the ultimate decision here is how 
>>>>>>>>>>>>>>> opinionated
>>>>>>>>>>>>>>> Iceberg wants to be on its use-cases.  If it really wants to 
>>>>>>>>>>>>>>> stick to its
>>>>>>>>>>>>>>> origins of "slow moving data", then removing equality deletes 
>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>> inline with this.  I think the other high level question is how 
>>>>>>>>>>>>>>> much we
>>>>>>>>>>>>>>> allow for partially compatible features (the row lineage 
>>>>>>>>>>>>>>> use-case feature
>>>>>>>>>>>>>>> was explicitly approved excluding equality deletes, and people 
>>>>>>>>>>>>>>> seemed OK
>>>>>>>>>>>>>>> with it at the time.  If all features need to work together, 
>>>>>>>>>>>>>>> then maybe we
>>>>>>>>>>>>>>> need to rethink the design here so it can be forward compatible 
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>> equality deletes).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think one issue with equality deletes as stated in the
>>>>>>>>>>>>>>> spec is that they are overly broad.  I'd be interested if 
>>>>>>>>>>>>>>> people have any
>>>>>>>>>>>>>>> use cases that differ, but I think one way of narrowing (and 
>>>>>>>>>>>>>>> probably a
>>>>>>>>>>>>>>> necessary building block for building something better)  the 
>>>>>>>>>>>>>>> specification
>>>>>>>>>>>>>>> scope on equality deletes is to focus on upsert/Streaming 
>>>>>>>>>>>>>>> deletes.  Two
>>>>>>>>>>>>>>> proposals in this regard are:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1.  Require that equality deletes can only correspond to
>>>>>>>>>>>>>>> unique identifiers for the table.
>>>>>>>>>>>>>>> 2.  Consider requiring that for equality deletes on
>>>>>>>>>>>>>>> partitioned tables, that the primary key must contain a 
>>>>>>>>>>>>>>> partition column (I
>>>>>>>>>>>>>>> believe Flink at least already does this).  It is less clear to 
>>>>>>>>>>>>>>> me that
>>>>>>>>>>>>>>> this would meet all existing use-cases.  But having this would 
>>>>>>>>>>>>>>> allow for
>>>>>>>>>>>>>>> better incremental data-structures, which could then be 
>>>>>>>>>>>>>>> partition based.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Narrow scope to unique identifiers would allow for further
>>>>>>>>>>>>>>> building blocks already mentioned, like a secondary index 
>>>>>>>>>>>>>>> (possible via LSM
>>>>>>>>>>>>>>> tree), that would allow for better performance overall.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I generally agree with the sentiment that we shouldn't
>>>>>>>>>>>>>>> deprecate them until there is a viable replacement.  With all 
>>>>>>>>>>>>>>> due respect
>>>>>>>>>>>>>>> to my employer, let's not fall into the Google trap [1] :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Micah
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1] https://goomics.net/50/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo <
>>>>>>>>>>>>>>> alex...@starburstdata.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hey all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Just to throw my 2 cents in, I agree with Steven and others
>>>>>>>>>>>>>>>> that we do need some kind of replacement before deprecating 
>>>>>>>>>>>>>>>> equality
>>>>>>>>>>>>>>>> deletes.
>>>>>>>>>>>>>>>> They certainly have their problems, and do significantly
>>>>>>>>>>>>>>>> increase complexity as they are now, but the writing of 
>>>>>>>>>>>>>>>> position deletes is
>>>>>>>>>>>>>>>> too expensive for certain pipelines.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We've been investigating using equality deletes for some of
>>>>>>>>>>>>>>>> our workloads at Starburst, the key advantage we were hoping 
>>>>>>>>>>>>>>>> to leverage is
>>>>>>>>>>>>>>>> cheap, effectively random access lookup deletes.
>>>>>>>>>>>>>>>> Say you have a UUID column that's unique in a table and
>>>>>>>>>>>>>>>> want to delete a row by UUID. With position deletes each 
>>>>>>>>>>>>>>>> delete is
>>>>>>>>>>>>>>>> expensive without an index on that UUID.
>>>>>>>>>>>>>>>> With equality deletes each delete is cheap and while
>>>>>>>>>>>>>>>> reads/compaction is expensive but when updates are frequent 
>>>>>>>>>>>>>>>> and reads are
>>>>>>>>>>>>>>>> sporadic that's a reasonable tradeoff.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Pretty much what Jason and Steven have already said.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Maybe there are some incremental improvements on equality
>>>>>>>>>>>>>>>> deletes or tips from similar systems that might alleviate some 
>>>>>>>>>>>>>>>> of their
>>>>>>>>>>>>>>>> problems?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - Alex Jo
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu <
>>>>>>>>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We probably all agree with the downside of equality
>>>>>>>>>>>>>>>>> deletes: it postpones all the work on the read path.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In theory, we can implement position deletes only in the
>>>>>>>>>>>>>>>>> Flink streaming writer. It would require the tracking of last 
>>>>>>>>>>>>>>>>> committed
>>>>>>>>>>>>>>>>> data files per key, which can be stored in Flink state 
>>>>>>>>>>>>>>>>> (checkpointed). This
>>>>>>>>>>>>>>>>> is obviously quite expensive/challenging, but possible.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I like to echo one benefit of equality deletes that Russel
>>>>>>>>>>>>>>>>> called out in the original email. Equality deletes would never
>>>>>>>>>>>>>>>>> have conflicts. that is important for streaming writers 
>>>>>>>>>>>>>>>>> (Flink, Kafka
>>>>>>>>>>>>>>>>> connect, ...) that commit frequently (minutes or less). 
>>>>>>>>>>>>>>>>> Assume Flink can
>>>>>>>>>>>>>>>>> write position deletes only and commit every 2 minutes. The 
>>>>>>>>>>>>>>>>> long-running
>>>>>>>>>>>>>>>>> nature of streaming jobs can cause frequent commit conflicts 
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> background delete compaction jobs.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Overall, the streaming upsert write is not a well solved
>>>>>>>>>>>>>>>>> problem in Iceberg. This probably affects all streaming 
>>>>>>>>>>>>>>>>> engines (Flink,
>>>>>>>>>>>>>>>>> Kafka connect, Spark streaming, ...). We need to come up with 
>>>>>>>>>>>>>>>>> some better
>>>>>>>>>>>>>>>>> alternatives before we can deprecate equality deletes.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer <
>>>>>>>>>>>>>>>>> russell.spit...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For users of Equality Deletes, what are the key
>>>>>>>>>>>>>>>>>> benefits to Equality Deletes that you would like to preserve 
>>>>>>>>>>>>>>>>>> and could you
>>>>>>>>>>>>>>>>>> please share some concrete examples of the queries you want 
>>>>>>>>>>>>>>>>>> to run (and the
>>>>>>>>>>>>>>>>>> schemas and data sizes you would like to run them against) 
>>>>>>>>>>>>>>>>>> and the
>>>>>>>>>>>>>>>>>> latencies that would be acceptable?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine
>>>>>>>>>>>>>>>>>> <ja...@upsolver.com.invalid> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Representing Upsolver here, we also make use of Equality
>>>>>>>>>>>>>>>>>>> Deletes to deliver high frequency low latency updates to 
>>>>>>>>>>>>>>>>>>> our clients at
>>>>>>>>>>>>>>>>>>> scale. We have customers using them at scale and 
>>>>>>>>>>>>>>>>>>> demonstrating the need and
>>>>>>>>>>>>>>>>>>> viability. We automate the process of converting them into 
>>>>>>>>>>>>>>>>>>> positional
>>>>>>>>>>>>>>>>>>> deletes (or fully applying them) for more efficient engine 
>>>>>>>>>>>>>>>>>>> queries in the
>>>>>>>>>>>>>>>>>>> background giving our users both low latency and good query 
>>>>>>>>>>>>>>>>>>> performance.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Equality Deletes were added since there isn't a good way
>>>>>>>>>>>>>>>>>>> to solve frequent updates otherwise. It would require some 
>>>>>>>>>>>>>>>>>>> sort of index
>>>>>>>>>>>>>>>>>>> keeping track of every record in the table (by a 
>>>>>>>>>>>>>>>>>>> predetermined PK) and
>>>>>>>>>>>>>>>>>>> maintaining such an index is a huge task that every tool 
>>>>>>>>>>>>>>>>>>> interested in this
>>>>>>>>>>>>>>>>>>> would need to re-implement. It also becomes a bottleneck 
>>>>>>>>>>>>>>>>>>> limiting table
>>>>>>>>>>>>>>>>>>> sizes.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I don't think they should be removed without providing
>>>>>>>>>>>>>>>>>>> an alternative. Positional Deletes have a different 
>>>>>>>>>>>>>>>>>>> performance profile
>>>>>>>>>>>>>>>>>>> inherently, requiring more upfront work proportional to the 
>>>>>>>>>>>>>>>>>>> table size.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré <
>>>>>>>>>>>>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Russell
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks for the nice writeup and the proposal.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I agree with your analysis, and I have the same
>>>>>>>>>>>>>>>>>>>> feeling. However, I
>>>>>>>>>>>>>>>>>>>> think there are more than Flink that write equality
>>>>>>>>>>>>>>>>>>>> delete files. So,
>>>>>>>>>>>>>>>>>>>> I agree to deprecate in V3, but maybe be more
>>>>>>>>>>>>>>>>>>>> "flexible" about removal
>>>>>>>>>>>>>>>>>>>> in V4 in order to give time to engines to update.
>>>>>>>>>>>>>>>>>>>> I think that by deprecating equality deletes, we are
>>>>>>>>>>>>>>>>>>>> clearly focusing
>>>>>>>>>>>>>>>>>>>> on read performance and "consistency" (more than
>>>>>>>>>>>>>>>>>>>> write). It's not
>>>>>>>>>>>>>>>>>>>> necessarily a bad thing but the streaming platform and
>>>>>>>>>>>>>>>>>>>> data ingestion
>>>>>>>>>>>>>>>>>>>> platforms will be probably concerned about that (by
>>>>>>>>>>>>>>>>>>>> using positional
>>>>>>>>>>>>>>>>>>>> deletes, they will have to scan/read all datafiles to
>>>>>>>>>>>>>>>>>>>> find the
>>>>>>>>>>>>>>>>>>>> position, so painful).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> So, to summarize:
>>>>>>>>>>>>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to
>>>>>>>>>>>>>>>>>>>> commit any target
>>>>>>>>>>>>>>>>>>>> for deletion before having a clear path for streaming
>>>>>>>>>>>>>>>>>>>> platforms
>>>>>>>>>>>>>>>>>>>> (Flink, Beam, ...)
>>>>>>>>>>>>>>>>>>>> 2. In the meantime (during the deprecation period), I
>>>>>>>>>>>>>>>>>>>> propose to
>>>>>>>>>>>>>>>>>>>> explore possible improvements for streaming platforms
>>>>>>>>>>>>>>>>>>>> (maybe finding a
>>>>>>>>>>>>>>>>>>>> way to avoid full data files scan, ...)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks !
>>>>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer
>>>>>>>>>>>>>>>>>>>> <russell.spit...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Background:
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > 1) Position Deletes
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Writers determine what rows are deleted and mark them
>>>>>>>>>>>>>>>>>>>> in a 1 for 1 representation. With delete vectors this 
>>>>>>>>>>>>>>>>>>>> means every data file
>>>>>>>>>>>>>>>>>>>> has at most 1 delete vector that it is read in conjunction 
>>>>>>>>>>>>>>>>>>>> with to excise
>>>>>>>>>>>>>>>>>>>> deleted rows. Reader overhead is more or less constant and 
>>>>>>>>>>>>>>>>>>>> is very
>>>>>>>>>>>>>>>>>>>> predictable.
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > The main cost of this mode is that deletes must be
>>>>>>>>>>>>>>>>>>>> determined at write time which is expensive and can be 
>>>>>>>>>>>>>>>>>>>> more difficult for
>>>>>>>>>>>>>>>>>>>> conflict resolution
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > 2) Equality Deletes
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Writers write out reference to what values are
>>>>>>>>>>>>>>>>>>>> deleted (in a partition or globally). There can be an 
>>>>>>>>>>>>>>>>>>>> unlimited number of
>>>>>>>>>>>>>>>>>>>> equality deletes and they all must be checked for every 
>>>>>>>>>>>>>>>>>>>> data file that is
>>>>>>>>>>>>>>>>>>>> read. The cost of determining deleted rows is essentially 
>>>>>>>>>>>>>>>>>>>> given to the
>>>>>>>>>>>>>>>>>>>> reader.
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Conflicts almost never happen since data files are
>>>>>>>>>>>>>>>>>>>> not actually changed and there is almost no cost to the 
>>>>>>>>>>>>>>>>>>>> writer to generate
>>>>>>>>>>>>>>>>>>>> these. Almost all costs related to equality deletes are 
>>>>>>>>>>>>>>>>>>>> passed on to the
>>>>>>>>>>>>>>>>>>>> reader.
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Proposal:
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Equality deletes are, in my opinion, unsustainable
>>>>>>>>>>>>>>>>>>>> and we should work on deprecating and removing them from 
>>>>>>>>>>>>>>>>>>>> the specification.
>>>>>>>>>>>>>>>>>>>> At this time, I know of only one engine (Apache Flink) 
>>>>>>>>>>>>>>>>>>>> which produces these
>>>>>>>>>>>>>>>>>>>> deletes but almost all engines have implementations to 
>>>>>>>>>>>>>>>>>>>> read them. The cost
>>>>>>>>>>>>>>>>>>>> of implementing equality deletes on the read path is 
>>>>>>>>>>>>>>>>>>>> difficult and
>>>>>>>>>>>>>>>>>>>> unpredictable in terms of memory usage and compute 
>>>>>>>>>>>>>>>>>>>> complexity. We’ve had
>>>>>>>>>>>>>>>>>>>> suggestions of implementing rocksdb inorder to handle ever 
>>>>>>>>>>>>>>>>>>>> growing sets of
>>>>>>>>>>>>>>>>>>>> equality deletes which in my opinion shows that we are 
>>>>>>>>>>>>>>>>>>>> going down the wrong
>>>>>>>>>>>>>>>>>>>> path.
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Outside of performance, Equality deletes are also
>>>>>>>>>>>>>>>>>>>> difficult to use in conjunction with many other features. 
>>>>>>>>>>>>>>>>>>>> For example, any
>>>>>>>>>>>>>>>>>>>> features requiring CDC or Row lineage are basically 
>>>>>>>>>>>>>>>>>>>> impossible when
>>>>>>>>>>>>>>>>>>>> equality deletes are in use. When Equality deletes are 
>>>>>>>>>>>>>>>>>>>> present, the state
>>>>>>>>>>>>>>>>>>>> of the table can only be determined with a full scan 
>>>>>>>>>>>>>>>>>>>> making it difficult to
>>>>>>>>>>>>>>>>>>>> update differential structures. This means materialized 
>>>>>>>>>>>>>>>>>>>> views or indexes
>>>>>>>>>>>>>>>>>>>> need to essentially be fully rebuilt whenever an equality 
>>>>>>>>>>>>>>>>>>>> delete is added
>>>>>>>>>>>>>>>>>>>> to the table.
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Equality deletes essentially remove complexity from
>>>>>>>>>>>>>>>>>>>> the write side but then add what I believe is an 
>>>>>>>>>>>>>>>>>>>> unacceptable level of
>>>>>>>>>>>>>>>>>>>> complexity to the read side.
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Because of this I suggest we deprecate Equality
>>>>>>>>>>>>>>>>>>>> Deletes in V3 and slate them for full removal from the 
>>>>>>>>>>>>>>>>>>>> Iceberg Spec in V4.
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > I know this is a big change and compatibility
>>>>>>>>>>>>>>>>>>>> breakage so I would like to introduce this idea to the 
>>>>>>>>>>>>>>>>>>>> community and
>>>>>>>>>>>>>>>>>>>> solicit feedback from all stakeholders. I am very flexible 
>>>>>>>>>>>>>>>>>>>> on this issue
>>>>>>>>>>>>>>>>>>>> and would like to hear the best issues both for and 
>>>>>>>>>>>>>>>>>>>> against removal of
>>>>>>>>>>>>>>>>>>>> Equality Deletes.
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Thanks everyone for your time,
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > Russ Spitzer
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> *Jason Fine*
>>>>>>>>>>>>>>>>>>> Chief Software Architect
>>>>>>>>>>>>>>>>>>> ja...@upsolver.com  | www.upsolver.com
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>

Reply via email to