The proposal sounds similar to the Delta Lake CDC feature with CDC file
type [1] and CDC action [2].

There was also the proposal I wrote a long time ago [3] to use a "cdc"
branch rather than 2 private tables, which was inspired by the Delta Lake
approach. The feedback was mixed at that time because on one side at least
the user does not need to have a 2 table setup and it is still considered
doing CDC against one Iceberg table, but on the other side branching
construct was not supported widely with enterprise level ETL and governance
features, and having 2 tables might just be cleaner after all. But we have
seen customers implementing the cdc branch approach in the proposal and it
was successful.

Either way, in a Delta table based CDC approach, for the reader, we could
choose the view approach Russell described above, or develop a reader that
does essentially a broadcast join at scan level.

Overall, I think we have a lot of options on the table to solve CDC in
Iceberg for both read and write.

Given the row lineage feature is fundamentally in conflict with the
equality deletes, I would +1 for dropping equality delete support.

Best,
Jack Ye

[1]
https://github.com/delta-io/delta/blob/master/PROTOCOL.md#change-data-files
[2] https://github.com/delta-io/delta/blob/master/PROTOCOL.md#add-cdc-file
[3]
https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit?tab=t.0


On Tue, Nov 19, 2024 at 7:56 AM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> I'm strongly in favor of moving to the Delta + Base table approach
> discussed in the cookbook above. I wonder if we should codify that into
> something more standardized but it seems to me to be a much better path
> forward. I'm not sure we need to support his at the spec level but it would
> be nice if we could provide a table that automatically was broken into sub
> tables and had well defined operations on it.
>
> For example:
>
> FastUpdateTable:
>    Requires:
>      Primary Key Columns
>      Long Max Delta Size
>    Contains:
>        Private Iceberg Table: Delta
>        Private Iceberg Table: Base
>
>    On All Scans -
>        Return a view which joins delta and base on primary key, if Delta
> has a record for a given primary key discard the base record
>
>   On All Writes -
>        Perform all writes against the delta table, only MERGE is allowed.
> Append is forbidden (No PK Guarantees) Only position deletes are allowed.
>
>    On Delta Table Size Max Delta Size- -
>        Upsert DELTA into BASE
>        Clear upserted records from Delta
>
>
> If the Delta Table size is kept small I think this would be almost as
> performant as Equality deletes but still be compatible with row-lineage and
> other indexing features.
>
>
> On Tue, Nov 19, 2024 at 7:12 AM Manu Zhang <owenzhang1...@gmail.com>
> wrote:
>
>> Hi Ajantha,
>>
>> I'm proposing exploring a view-based approach similar to the
>> changelog-mirror table pattern[1] rather than supporting delta writers for
>> Kafka connect Iceberg sink.
>>
>> 1.
>> https://www.tabular.io/apache-iceberg-cookbook/data-engineering-cdc-table-mirroring/
>>
>> On Tue, Nov 19, 2024 at 7:38 PM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>>> I don’t think it’s a problem while an alternative is explored (the JDK
>>> itself does that pretty often).
>>> So it’s up to the community: of course I’m against removing it without
>>> solid alternative, but deprecation is fine imho.
>>>
>>> Regards
>>> JB
>>>
>>> Le mar. 19 nov. 2024 à 12:19, Ajantha Bhat <ajanthab...@gmail.com> a
>>> écrit :
>>>
>>>> - ok for deprecate equality deletes
>>>>> - not ok to remove it
>>>>
>>>>
>>>> @JB: I don't think it is a good idea to use deprecated functionality in
>>>> the new feature development.
>>>> Hence, my specific question was about kafka connect upsert operation.
>>>>
>>>> @Manu: I meant the delta writers for kafka connect Iceberg sink (which
>>>> in turn used for upsetting the CDC records)
>>>> https://github.com/apache/iceberg/issues/10842
>>>>
>>>>
>>>> - Ajantha
>>>>
>>>>
>>>>
>>>> On Tue, Nov 19, 2024 at 3:08 PM Manu Zhang <owenzhang1...@gmail.com>
>>>> wrote:
>>>>
>>>>> I second Anton's proposal to standardize on a view-based approach to
>>>>> handle CDC cases.
>>>>> Actually, it's already been explored in detail[1] by Jack before.
>>>>>
>>>>> [1] Improving Change Data Capture Use Case for Apache Iceberg
>>>>> <https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit?tab=t.0#heading=h.94xnx4qg3bnt>
>>>>>
>>>>>
>>>>> On Tue, Nov 19, 2024 at 4:16 PM Jean-Baptiste Onofré <j...@nanthrax.net>
>>>>> wrote:
>>>>>
>>>>>> My proposal is the following (already expressed):
>>>>>> - ok for deprecate equality deletes
>>>>>> - not ok to remove it
>>>>>> - work on position deletes improvements to address streaming use
>>>>>> cases. I think we should explore different approaches. Personally I 
>>>>>> think a
>>>>>> possible approach would be to find index way to data files to avoid full
>>>>>> scan to find row position.
>>>>>>
>>>>>> My $0.01 :)
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>> Le mar. 19 nov. 2024 à 07:53, Ajantha Bhat <ajanthab...@gmail.com> a
>>>>>> écrit :
>>>>>>
>>>>>>> Hi, What's the conclusion on this thread?
>>>>>>>
>>>>>>> Users are looking for Upsert (CDC) support for OSS Iceberg
>>>>>>> kafka connect sink.
>>>>>>> We only support appends at the moment. Can we go ahead and implement
>>>>>>> the upserts using equality deletes?
>>>>>>>
>>>>>>>
>>>>>>> - Ajantha
>>>>>>>
>>>>>>> On Sun, Nov 10, 2024 at 11:56 AM Vignesh <vignesh.v...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> I am reading about iceberg and am quite new to this.
>>>>>>>> This puffin would be an index from key to data file. Other use
>>>>>>>> cases of Puffin, such as statistics are at a per file level if I 
>>>>>>>> understand
>>>>>>>> correctly.
>>>>>>>>
>>>>>>>> Where would the puffin about key->data file be stored? It is a
>>>>>>>> property of the entire table.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vignesh.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Nov 9, 2024 at 2:17 AM Shani Elharrar
>>>>>>>> <sh...@upsolver.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> JB, this is what we do, we write Equality Deletes and periodically
>>>>>>>>> convert them to Positional Deletes.
>>>>>>>>>
>>>>>>>>> We could probably index the keys, maybe partially index using
>>>>>>>>> bloom filters, the best would be to put those bloom filters inside 
>>>>>>>>> puffin.
>>>>>>>>>
>>>>>>>>> Shani.
>>>>>>>>>
>>>>>>>>> On 9 Nov 2024, at 11:11, Jean-Baptiste Onofré <j...@nanthrax.net>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I agree with Peter here, and I would say that it would be an issue
>>>>>>>>> for multi-engine support.
>>>>>>>>>
>>>>>>>>> I think, as I already mentioned with others, we should explore an
>>>>>>>>> alternative.
>>>>>>>>> As the main issue is the datafile scan in streaming context, maybe
>>>>>>>>> we could find a way to "index"/correlate for positional deletes with
>>>>>>>>> limited scanning.
>>>>>>>>> I will think again about that :)
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> JB
>>>>>>>>>
>>>>>>>>> On Sat, Nov 9, 2024 at 6:48 AM Péter Váry <
>>>>>>>>> peter.vary.apa...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Imran,
>>>>>>>>>>
>>>>>>>>>> I don't think it's a good idea to start creating multiple types
>>>>>>>>>> of Iceberg tables. Iceberg's main selling point is compatibility 
>>>>>>>>>> between
>>>>>>>>>> engines. If we don't have readers and writers for all types of 
>>>>>>>>>> tables, then
>>>>>>>>>> we remove compatibility from the equation and engine specific formats
>>>>>>>>>> always win. OTOH, if we write readers and writers for all types of 
>>>>>>>>>> tables
>>>>>>>>>> then we are back on square one.
>>>>>>>>>>
>>>>>>>>>> Identifier fields are a table schema concept and used in many
>>>>>>>>>> cases during query planning and execution. This is why they are 
>>>>>>>>>> defined as
>>>>>>>>>> part of the SQL spec, and this is why Iceberg defines them as well. 
>>>>>>>>>> One use
>>>>>>>>>> case is where they can be used to merge deletes (independently of 
>>>>>>>>>> how they
>>>>>>>>>> are manifested) and subsequent inserts, into updates.
>>>>>>>>>>
>>>>>>>>>> Flink SQL doesn't allow creating tables with partition
>>>>>>>>>> transforms, so no new table could be created by Flink SQL using 
>>>>>>>>>> transforms,
>>>>>>>>>> but tables created by other engines could still be used (both read an
>>>>>>>>>> write). Also you can create such tables in Flink using the Java API.
>>>>>>>>>>
>>>>>>>>>> Requiring partition columns be part of the identifier fields is
>>>>>>>>>> coming from the practical consideration, that you want to limit the 
>>>>>>>>>> scope
>>>>>>>>>> of the equality deletes as much as possible. Otherwise all of the 
>>>>>>>>>> equality
>>>>>>>>>> deletes should be table global, and they should be read by every 
>>>>>>>>>> reader. We
>>>>>>>>>> could write those, we just decided that we don't want to allow the 
>>>>>>>>>> user to
>>>>>>>>>> do this, as it is most cases a bad idea.
>>>>>>>>>>
>>>>>>>>>> I hope this helps,
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>> On Fri, Nov 8, 2024, 22:01 Imran Rashid
>>>>>>>>>> <iras...@cloudera.com.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm not down in the weeds at all myself on implementation
>>>>>>>>>>> details, so forgive me if I'm wrong about the details here.
>>>>>>>>>>>
>>>>>>>>>>> I can see all the viewpoints -- both that equality deletes
>>>>>>>>>>> enable some use cases, but also make others far more difficult.
>>>>>>>>>>> What surprised me the most is that Iceberg does not provide a way to
>>>>>>>>>>> distinguish these two table "types".
>>>>>>>>>>>
>>>>>>>>>>> At first, I thought the presence of an identifier-field (
>>>>>>>>>>> https://iceberg.apache.org/spec/#identifier-field-ids)
>>>>>>>>>>> indicated that the table was a target for equality deletes.  But, 
>>>>>>>>>>> then it
>>>>>>>>>>> turns out identifier-fields are also useful for changelog views even
>>>>>>>>>>> without equality deletes -- IIUC, they show that a delete + insert 
>>>>>>>>>>> should
>>>>>>>>>>> actually be interpreted as an update in changelog view.
>>>>>>>>>>>
>>>>>>>>>>> To be perfectly honest, I'm confused about all of these details
>>>>>>>>>>> -- from my read, the spec does not indicate this relationship 
>>>>>>>>>>> between
>>>>>>>>>>> identifier-fields and equality_ids in equality delete files (
>>>>>>>>>>> https://iceberg.apache.org/spec/#equality-delete-files), but I
>>>>>>>>>>> think that is the way Flink works.  Flink itself seems to have even 
>>>>>>>>>>> more
>>>>>>>>>>> limitations -- no partition transforms are allowed, and all 
>>>>>>>>>>> partition
>>>>>>>>>>> columns must be a subset of the identifier fields.  Is that just a 
>>>>>>>>>>> Flink
>>>>>>>>>>> limitation, or is that the intended behavior in the spec?  (Or maybe
>>>>>>>>>>> user-error on my part?)  Those seem like very reasonable 
>>>>>>>>>>> limitations, from
>>>>>>>>>>> an implementation point-of-view.  But OTOH, as a user, this seems 
>>>>>>>>>>> to be
>>>>>>>>>>> directly contrary to some of the promises of Iceberg.
>>>>>>>>>>>
>>>>>>>>>>> Its easy to see if a table already has equality deletes in it,
>>>>>>>>>>> by looking at the metadata.  But is there any way to indicate that 
>>>>>>>>>>> a table
>>>>>>>>>>> (or branch of a table) _must not_ have equality deletes added to it?
>>>>>>>>>>>
>>>>>>>>>>> If that were possible, it seems like we could support both use
>>>>>>>>>>> cases.  We could continue to optimize for the streaming ingestion 
>>>>>>>>>>> use cases
>>>>>>>>>>> using equality deletes.  But we could also build more optimizations 
>>>>>>>>>>> into
>>>>>>>>>>> the "non-streaming-ingestion" branches.  And we could document the 
>>>>>>>>>>> tradeoff
>>>>>>>>>>> so it is much clearer to end users.
>>>>>>>>>>>
>>>>>>>>>>> To maintain compatibility, I suppose that the change would be
>>>>>>>>>>> that equality deletes continue to be allowed by default, but we'd 
>>>>>>>>>>> add a new
>>>>>>>>>>> field to indicate that for some tables (or branches of a table), 
>>>>>>>>>>> equality
>>>>>>>>>>> deletes would not be allowed.  And it would be an error for an 
>>>>>>>>>>> engine to
>>>>>>>>>>> make an update which added an equality delete to such a table.
>>>>>>>>>>>
>>>>>>>>>>> Maybe that change would even be possible in V3.
>>>>>>>>>>>
>>>>>>>>>>> And if all the performance improvements to equality deletes make
>>>>>>>>>>> this a moot point, we could drop the field in v4.  But it seems 
>>>>>>>>>>> like a
>>>>>>>>>>> mistake to both limit the non-streaming use-case AND have confusing
>>>>>>>>>>> limitations for the end-user in the meantime.
>>>>>>>>>>>
>>>>>>>>>>> I would happily be corrected about my understanding of all of
>>>>>>>>>>> the above.
>>>>>>>>>>>
>>>>>>>>>>> thanks!
>>>>>>>>>>> Imran
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Nov 5, 2024 at 9:16 AM Bryan Keller <brya...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I also feel we should keep equality deletes until we have an
>>>>>>>>>>>> alternative solution for streaming updates/deletes.
>>>>>>>>>>>>
>>>>>>>>>>>> -Bryan
>>>>>>>>>>>>
>>>>>>>>>>>> On Nov 4, 2024, at 8:33 AM, Péter Váry <
>>>>>>>>>>>> peter.vary.apa...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Well, it seems like I'm a little late, so most of the arguments
>>>>>>>>>>>> are voiced.
>>>>>>>>>>>>
>>>>>>>>>>>> I agree that we should not deprecate the equality deletes until
>>>>>>>>>>>> we have a replacement feature.
>>>>>>>>>>>> I think one of the big advantages of Iceberg is that it
>>>>>>>>>>>> supports batch processing and streaming ingestion too.
>>>>>>>>>>>> For streaming ingestion we need a way to update existing data
>>>>>>>>>>>> in a performant way, but restricting deletes for the primary keys 
>>>>>>>>>>>> seems
>>>>>>>>>>>> like enough from the streaming perspective.
>>>>>>>>>>>>
>>>>>>>>>>>> Equality deletes allow a very wide range of applications, which
>>>>>>>>>>>> we might be able to narrow down a bit, but still keep useful. So 
>>>>>>>>>>>> if we want
>>>>>>>>>>>> to go down this road, we need to start collecting the requirements.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Peter
>>>>>>>>>>>>
>>>>>>>>>>>> Shani Elharrar <sh...@upsolver.com.invalid> ezt írta (időpont:
>>>>>>>>>>>> 2024. nov. 1., P, 19:22):
>>>>>>>>>>>>
>>>>>>>>>>>>> I understand how it makes sense for batch jobs, but it damages
>>>>>>>>>>>>> stream jobs, using equality deletes works much better for 
>>>>>>>>>>>>> streaming (which
>>>>>>>>>>>>> have a strict SLA for delays), and in order to decrease the 
>>>>>>>>>>>>> performance
>>>>>>>>>>>>> penalty - systems can rewrite the equality deletes to positional 
>>>>>>>>>>>>> deletes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Shani.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 1 Nov 2024, at 20:06, Steven Wu <stevenz...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Fundamentally, it is very difficult to write position deletes
>>>>>>>>>>>>> with concurrent writers and conflicts for batch jobs too, as the 
>>>>>>>>>>>>> inverted
>>>>>>>>>>>>> index may become invalid/stale.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The position deletes are created during the write phase. But
>>>>>>>>>>>>> conflicts are only detected at the commit stage. I assume the 
>>>>>>>>>>>>> batch job
>>>>>>>>>>>>> should fail in this case.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Nov 1, 2024 at 10:57 AM Steven Wu <
>>>>>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Shani,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That is a good point. It is certainly a limitation for the
>>>>>>>>>>>>>> Flink job to track the inverted index internally (which is what 
>>>>>>>>>>>>>> I had in
>>>>>>>>>>>>>> mind). It can't be shared/synchronized with other Flink jobs or 
>>>>>>>>>>>>>> other
>>>>>>>>>>>>>> engines writing to the same table.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar
>>>>>>>>>>>>>> <sh...@upsolver.com.invalid> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Even if Flink can create this state, it would have to be
>>>>>>>>>>>>>>> maintained against the Iceberg table, we wouldn't like 
>>>>>>>>>>>>>>> duplicates (keys) if
>>>>>>>>>>>>>>> other systems / users update the table (e.g manual insert / 
>>>>>>>>>>>>>>> updates using
>>>>>>>>>>>>>>> DML).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Shani.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> > Add support for inverted indexes to reduce the cost of
>>>>>>>>>>>>>>> position lookup. This is fairly tricky to implement for 
>>>>>>>>>>>>>>> streaming use cases
>>>>>>>>>>>>>>> without an external system.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Anton, that is also what I was saying earlier. In Flink, the
>>>>>>>>>>>>>>> inverted index of (key, committed data files) can be tracked in 
>>>>>>>>>>>>>>> Flink state.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi <
>>>>>>>>>>>>>>> aokolnyc...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I was a bit skeptical when we were adding equality deletes,
>>>>>>>>>>>>>>>> but nothing beats their performance during writes. We have to 
>>>>>>>>>>>>>>>> find an
>>>>>>>>>>>>>>>> alternative before deprecating.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We are doing a lot of work to improve streaming, like
>>>>>>>>>>>>>>>> reducing the cost of commits, enabling a large (potentially 
>>>>>>>>>>>>>>>> infinite)
>>>>>>>>>>>>>>>> number of snapshots, changelog reads, and so on. It is a 
>>>>>>>>>>>>>>>> project goal to
>>>>>>>>>>>>>>>> excel in streaming.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I was going to focus on equality deletes after completing
>>>>>>>>>>>>>>>> the DV work. I believe we have these options:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - Revisit the existing design of equality deletes (e.g. add
>>>>>>>>>>>>>>>> more restrictions, improve compaction, offer new writers).
>>>>>>>>>>>>>>>> - Standardize on the view-based approach [1] to handle
>>>>>>>>>>>>>>>> streaming upserts and CDC use cases, potentially making this 
>>>>>>>>>>>>>>>> part of the
>>>>>>>>>>>>>>>> spec.
>>>>>>>>>>>>>>>> - Add support for inverted indexes to reduce the cost of
>>>>>>>>>>>>>>>> position lookup. This is fairly tricky to implement for 
>>>>>>>>>>>>>>>> streaming use cases
>>>>>>>>>>>>>>>> without an external system. Our runtime filtering in Spark 
>>>>>>>>>>>>>>>> today is
>>>>>>>>>>>>>>>> equivalent to looking up positions in an inverted index 
>>>>>>>>>>>>>>>> represented by
>>>>>>>>>>>>>>>> another Iceberg table. That may still not be enough for some 
>>>>>>>>>>>>>>>> streaming use
>>>>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - Anton
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield <
>>>>>>>>>>>>>>>> emkornfi...@gmail.com> пише:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I agree that equality deletes have their place in
>>>>>>>>>>>>>>>>> streaming.  I think the ultimate decision here is how 
>>>>>>>>>>>>>>>>> opinionated
>>>>>>>>>>>>>>>>> Iceberg wants to be on its use-cases.  If it really wants to 
>>>>>>>>>>>>>>>>> stick to its
>>>>>>>>>>>>>>>>> origins of "slow moving data", then removing equality deletes 
>>>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>>> inline with this.  I think the other high level question is 
>>>>>>>>>>>>>>>>> how much we
>>>>>>>>>>>>>>>>> allow for partially compatible features (the row lineage 
>>>>>>>>>>>>>>>>> use-case feature
>>>>>>>>>>>>>>>>> was explicitly approved excluding equality deletes, and 
>>>>>>>>>>>>>>>>> people seemed OK
>>>>>>>>>>>>>>>>> with it at the time.  If all features need to work together, 
>>>>>>>>>>>>>>>>> then maybe we
>>>>>>>>>>>>>>>>> need to rethink the design here so it can be forward 
>>>>>>>>>>>>>>>>> compatible with
>>>>>>>>>>>>>>>>> equality deletes).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think one issue with equality deletes as stated in the
>>>>>>>>>>>>>>>>> spec is that they are overly broad.  I'd be interested if 
>>>>>>>>>>>>>>>>> people have any
>>>>>>>>>>>>>>>>> use cases that differ, but I think one way of narrowing (and 
>>>>>>>>>>>>>>>>> probably a
>>>>>>>>>>>>>>>>> necessary building block for building something better)  the 
>>>>>>>>>>>>>>>>> specification
>>>>>>>>>>>>>>>>> scope on equality deletes is to focus on upsert/Streaming 
>>>>>>>>>>>>>>>>> deletes.  Two
>>>>>>>>>>>>>>>>> proposals in this regard are:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1.  Require that equality deletes can only correspond to
>>>>>>>>>>>>>>>>> unique identifiers for the table.
>>>>>>>>>>>>>>>>> 2.  Consider requiring that for equality deletes on
>>>>>>>>>>>>>>>>> partitioned tables, that the primary key must contain a 
>>>>>>>>>>>>>>>>> partition column (I
>>>>>>>>>>>>>>>>> believe Flink at least already does this).  It is less clear 
>>>>>>>>>>>>>>>>> to me that
>>>>>>>>>>>>>>>>> this would meet all existing use-cases.  But having this 
>>>>>>>>>>>>>>>>> would allow for
>>>>>>>>>>>>>>>>> better incremental data-structures, which could then be 
>>>>>>>>>>>>>>>>> partition based.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Narrow scope to unique identifiers would allow for further
>>>>>>>>>>>>>>>>> building blocks already mentioned, like a secondary index 
>>>>>>>>>>>>>>>>> (possible via LSM
>>>>>>>>>>>>>>>>> tree), that would allow for better performance overall.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I generally agree with the sentiment that we shouldn't
>>>>>>>>>>>>>>>>> deprecate them until there is a viable replacement.  With all 
>>>>>>>>>>>>>>>>> due respect
>>>>>>>>>>>>>>>>> to my employer, let's not fall into the Google trap [1] :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> Micah
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1] https://goomics.net/50/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo <
>>>>>>>>>>>>>>>>> alex...@starburstdata.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hey all,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Just to throw my 2 cents in, I agree with Steven and
>>>>>>>>>>>>>>>>>> others that we do need some kind of replacement before 
>>>>>>>>>>>>>>>>>> deprecating equality
>>>>>>>>>>>>>>>>>> deletes.
>>>>>>>>>>>>>>>>>> They certainly have their problems, and do significantly
>>>>>>>>>>>>>>>>>> increase complexity as they are now, but the writing of 
>>>>>>>>>>>>>>>>>> position deletes is
>>>>>>>>>>>>>>>>>> too expensive for certain pipelines.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We've been investigating using equality deletes for some
>>>>>>>>>>>>>>>>>> of our workloads at Starburst, the key advantage we were 
>>>>>>>>>>>>>>>>>> hoping to leverage
>>>>>>>>>>>>>>>>>> is cheap, effectively random access lookup deletes.
>>>>>>>>>>>>>>>>>> Say you have a UUID column that's unique in a table and
>>>>>>>>>>>>>>>>>> want to delete a row by UUID. With position deletes each 
>>>>>>>>>>>>>>>>>> delete is
>>>>>>>>>>>>>>>>>> expensive without an index on that UUID.
>>>>>>>>>>>>>>>>>> With equality deletes each delete is cheap and while
>>>>>>>>>>>>>>>>>> reads/compaction is expensive but when updates are frequent 
>>>>>>>>>>>>>>>>>> and reads are
>>>>>>>>>>>>>>>>>> sporadic that's a reasonable tradeoff.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Pretty much what Jason and Steven have already said.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Maybe there are some incremental improvements on equality
>>>>>>>>>>>>>>>>>> deletes or tips from similar systems that might alleviate 
>>>>>>>>>>>>>>>>>> some of their
>>>>>>>>>>>>>>>>>> problems?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> - Alex Jo
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu <
>>>>>>>>>>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We probably all agree with the downside of equality
>>>>>>>>>>>>>>>>>>> deletes: it postpones all the work on the read path.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In theory, we can implement position deletes only in the
>>>>>>>>>>>>>>>>>>> Flink streaming writer. It would require the tracking of 
>>>>>>>>>>>>>>>>>>> last committed
>>>>>>>>>>>>>>>>>>> data files per key, which can be stored in Flink state 
>>>>>>>>>>>>>>>>>>> (checkpointed). This
>>>>>>>>>>>>>>>>>>> is obviously quite expensive/challenging, but possible.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I like to echo one benefit of equality deletes that
>>>>>>>>>>>>>>>>>>> Russel called out in the original email. Equality deletes 
>>>>>>>>>>>>>>>>>>> would never
>>>>>>>>>>>>>>>>>>> have conflicts. that is important for streaming writers 
>>>>>>>>>>>>>>>>>>> (Flink, Kafka
>>>>>>>>>>>>>>>>>>> connect, ...) that commit frequently (minutes or less). 
>>>>>>>>>>>>>>>>>>> Assume Flink can
>>>>>>>>>>>>>>>>>>> write position deletes only and commit every 2 minutes. The 
>>>>>>>>>>>>>>>>>>> long-running
>>>>>>>>>>>>>>>>>>> nature of streaming jobs can cause frequent commit 
>>>>>>>>>>>>>>>>>>> conflicts with
>>>>>>>>>>>>>>>>>>> background delete compaction jobs.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Overall, the streaming upsert write is not a well solved
>>>>>>>>>>>>>>>>>>> problem in Iceberg. This probably affects all streaming 
>>>>>>>>>>>>>>>>>>> engines (Flink,
>>>>>>>>>>>>>>>>>>> Kafka connect, Spark streaming, ...). We need to come up 
>>>>>>>>>>>>>>>>>>> with some better
>>>>>>>>>>>>>>>>>>> alternatives before we can deprecate equality deletes.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer <
>>>>>>>>>>>>>>>>>>> russell.spit...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> For users of Equality Deletes, what are the key
>>>>>>>>>>>>>>>>>>>> benefits to Equality Deletes that you would like to 
>>>>>>>>>>>>>>>>>>>> preserve and could you
>>>>>>>>>>>>>>>>>>>> please share some concrete examples of the queries you 
>>>>>>>>>>>>>>>>>>>> want to run (and the
>>>>>>>>>>>>>>>>>>>> schemas and data sizes you would like to run them against) 
>>>>>>>>>>>>>>>>>>>> and the
>>>>>>>>>>>>>>>>>>>> latencies that would be acceptable?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine
>>>>>>>>>>>>>>>>>>>> <ja...@upsolver.com.invalid> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Representing Upsolver here, we also make use of
>>>>>>>>>>>>>>>>>>>>> Equality Deletes to deliver high frequency low latency 
>>>>>>>>>>>>>>>>>>>>> updates to our
>>>>>>>>>>>>>>>>>>>>> clients at scale. We have customers using them at scale 
>>>>>>>>>>>>>>>>>>>>> and demonstrating
>>>>>>>>>>>>>>>>>>>>> the need and viability. We automate the process of 
>>>>>>>>>>>>>>>>>>>>> converting them into
>>>>>>>>>>>>>>>>>>>>> positional deletes (or fully applying them) for more 
>>>>>>>>>>>>>>>>>>>>> efficient engine
>>>>>>>>>>>>>>>>>>>>> queries in the background giving our users both low 
>>>>>>>>>>>>>>>>>>>>> latency and good query
>>>>>>>>>>>>>>>>>>>>> performance.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Equality Deletes were added since there isn't a good
>>>>>>>>>>>>>>>>>>>>> way to solve frequent updates otherwise. It would require 
>>>>>>>>>>>>>>>>>>>>> some sort of
>>>>>>>>>>>>>>>>>>>>> index keeping track of every record in the table (by a 
>>>>>>>>>>>>>>>>>>>>> predetermined PK)
>>>>>>>>>>>>>>>>>>>>> and maintaining such an index is a huge task that every 
>>>>>>>>>>>>>>>>>>>>> tool interested in
>>>>>>>>>>>>>>>>>>>>> this would need to re-implement. It also becomes a 
>>>>>>>>>>>>>>>>>>>>> bottleneck limiting
>>>>>>>>>>>>>>>>>>>>> table sizes.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I don't think they should be removed without providing
>>>>>>>>>>>>>>>>>>>>> an alternative. Positional Deletes have a different 
>>>>>>>>>>>>>>>>>>>>> performance profile
>>>>>>>>>>>>>>>>>>>>> inherently, requiring more upfront work proportional to 
>>>>>>>>>>>>>>>>>>>>> the table size.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré <
>>>>>>>>>>>>>>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Russell
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks for the nice writeup and the proposal.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I agree with your analysis, and I have the same
>>>>>>>>>>>>>>>>>>>>>> feeling. However, I
>>>>>>>>>>>>>>>>>>>>>> think there are more than Flink that write equality
>>>>>>>>>>>>>>>>>>>>>> delete files. So,
>>>>>>>>>>>>>>>>>>>>>> I agree to deprecate in V3, but maybe be more
>>>>>>>>>>>>>>>>>>>>>> "flexible" about removal
>>>>>>>>>>>>>>>>>>>>>> in V4 in order to give time to engines to update.
>>>>>>>>>>>>>>>>>>>>>> I think that by deprecating equality deletes, we are
>>>>>>>>>>>>>>>>>>>>>> clearly focusing
>>>>>>>>>>>>>>>>>>>>>> on read performance and "consistency" (more than
>>>>>>>>>>>>>>>>>>>>>> write). It's not
>>>>>>>>>>>>>>>>>>>>>> necessarily a bad thing but the streaming platform
>>>>>>>>>>>>>>>>>>>>>> and data ingestion
>>>>>>>>>>>>>>>>>>>>>> platforms will be probably concerned about that (by
>>>>>>>>>>>>>>>>>>>>>> using positional
>>>>>>>>>>>>>>>>>>>>>> deletes, they will have to scan/read all datafiles to
>>>>>>>>>>>>>>>>>>>>>> find the
>>>>>>>>>>>>>>>>>>>>>> position, so painful).
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> So, to summarize:
>>>>>>>>>>>>>>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to
>>>>>>>>>>>>>>>>>>>>>> commit any target
>>>>>>>>>>>>>>>>>>>>>> for deletion before having a clear path for streaming
>>>>>>>>>>>>>>>>>>>>>> platforms
>>>>>>>>>>>>>>>>>>>>>> (Flink, Beam, ...)
>>>>>>>>>>>>>>>>>>>>>> 2. In the meantime (during the deprecation period), I
>>>>>>>>>>>>>>>>>>>>>> propose to
>>>>>>>>>>>>>>>>>>>>>> explore possible improvements for streaming platforms
>>>>>>>>>>>>>>>>>>>>>> (maybe finding a
>>>>>>>>>>>>>>>>>>>>>> way to avoid full data files scan, ...)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks !
>>>>>>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer
>>>>>>>>>>>>>>>>>>>>>> <russell.spit...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Background:
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > 1) Position Deletes
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Writers determine what rows are deleted and mark
>>>>>>>>>>>>>>>>>>>>>> them in a 1 for 1 representation. With delete vectors 
>>>>>>>>>>>>>>>>>>>>>> this means every data
>>>>>>>>>>>>>>>>>>>>>> file has at most 1 delete vector that it is read in 
>>>>>>>>>>>>>>>>>>>>>> conjunction with to
>>>>>>>>>>>>>>>>>>>>>> excise deleted rows. Reader overhead is more or less 
>>>>>>>>>>>>>>>>>>>>>> constant and is very
>>>>>>>>>>>>>>>>>>>>>> predictable.
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > The main cost of this mode is that deletes must be
>>>>>>>>>>>>>>>>>>>>>> determined at write time which is expensive and can be 
>>>>>>>>>>>>>>>>>>>>>> more difficult for
>>>>>>>>>>>>>>>>>>>>>> conflict resolution
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > 2) Equality Deletes
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Writers write out reference to what values are
>>>>>>>>>>>>>>>>>>>>>> deleted (in a partition or globally). There can be an 
>>>>>>>>>>>>>>>>>>>>>> unlimited number of
>>>>>>>>>>>>>>>>>>>>>> equality deletes and they all must be checked for every 
>>>>>>>>>>>>>>>>>>>>>> data file that is
>>>>>>>>>>>>>>>>>>>>>> read. The cost of determining deleted rows is 
>>>>>>>>>>>>>>>>>>>>>> essentially given to the
>>>>>>>>>>>>>>>>>>>>>> reader.
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Conflicts almost never happen since data files are
>>>>>>>>>>>>>>>>>>>>>> not actually changed and there is almost no cost to the 
>>>>>>>>>>>>>>>>>>>>>> writer to generate
>>>>>>>>>>>>>>>>>>>>>> these. Almost all costs related to equality deletes are 
>>>>>>>>>>>>>>>>>>>>>> passed on to the
>>>>>>>>>>>>>>>>>>>>>> reader.
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Proposal:
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Equality deletes are, in my opinion, unsustainable
>>>>>>>>>>>>>>>>>>>>>> and we should work on deprecating and removing them from 
>>>>>>>>>>>>>>>>>>>>>> the specification.
>>>>>>>>>>>>>>>>>>>>>> At this time, I know of only one engine (Apache Flink) 
>>>>>>>>>>>>>>>>>>>>>> which produces these
>>>>>>>>>>>>>>>>>>>>>> deletes but almost all engines have implementations to 
>>>>>>>>>>>>>>>>>>>>>> read them. The cost
>>>>>>>>>>>>>>>>>>>>>> of implementing equality deletes on the read path is 
>>>>>>>>>>>>>>>>>>>>>> difficult and
>>>>>>>>>>>>>>>>>>>>>> unpredictable in terms of memory usage and compute 
>>>>>>>>>>>>>>>>>>>>>> complexity. We’ve had
>>>>>>>>>>>>>>>>>>>>>> suggestions of implementing rocksdb inorder to handle 
>>>>>>>>>>>>>>>>>>>>>> ever growing sets of
>>>>>>>>>>>>>>>>>>>>>> equality deletes which in my opinion shows that we are 
>>>>>>>>>>>>>>>>>>>>>> going down the wrong
>>>>>>>>>>>>>>>>>>>>>> path.
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Outside of performance, Equality deletes are also
>>>>>>>>>>>>>>>>>>>>>> difficult to use in conjunction with many other 
>>>>>>>>>>>>>>>>>>>>>> features. For example, any
>>>>>>>>>>>>>>>>>>>>>> features requiring CDC or Row lineage are basically 
>>>>>>>>>>>>>>>>>>>>>> impossible when
>>>>>>>>>>>>>>>>>>>>>> equality deletes are in use. When Equality deletes are 
>>>>>>>>>>>>>>>>>>>>>> present, the state
>>>>>>>>>>>>>>>>>>>>>> of the table can only be determined with a full scan 
>>>>>>>>>>>>>>>>>>>>>> making it difficult to
>>>>>>>>>>>>>>>>>>>>>> update differential structures. This means materialized 
>>>>>>>>>>>>>>>>>>>>>> views or indexes
>>>>>>>>>>>>>>>>>>>>>> need to essentially be fully rebuilt whenever an 
>>>>>>>>>>>>>>>>>>>>>> equality delete is added
>>>>>>>>>>>>>>>>>>>>>> to the table.
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Equality deletes essentially remove complexity from
>>>>>>>>>>>>>>>>>>>>>> the write side but then add what I believe is an 
>>>>>>>>>>>>>>>>>>>>>> unacceptable level of
>>>>>>>>>>>>>>>>>>>>>> complexity to the read side.
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Because of this I suggest we deprecate Equality
>>>>>>>>>>>>>>>>>>>>>> Deletes in V3 and slate them for full removal from the 
>>>>>>>>>>>>>>>>>>>>>> Iceberg Spec in V4.
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > I know this is a big change and compatibility
>>>>>>>>>>>>>>>>>>>>>> breakage so I would like to introduce this idea to the 
>>>>>>>>>>>>>>>>>>>>>> community and
>>>>>>>>>>>>>>>>>>>>>> solicit feedback from all stakeholders. I am very 
>>>>>>>>>>>>>>>>>>>>>> flexible on this issue
>>>>>>>>>>>>>>>>>>>>>> and would like to hear the best issues both for and 
>>>>>>>>>>>>>>>>>>>>>> against removal of
>>>>>>>>>>>>>>>>>>>>>> Equality Deletes.
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Thanks everyone for your time,
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Russ Spitzer
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> *Jason Fine*
>>>>>>>>>>>>>>>>>>>>> Chief Software Architect
>>>>>>>>>>>>>>>>>>>>> ja...@upsolver.com  | www.upsolver.com
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>

Reply via email to