I'm strongly in favor of moving to the Delta + Base table approach
discussed in the cookbook above. I wonder if we should codify that into
something more standardized but it seems to me to be a much better path
forward. I'm not sure we need to support his at the spec level but it would
be nice if we could provide a table that automatically was broken into sub
tables and had well defined operations on it.

For example:

FastUpdateTable:
   Requires:
     Primary Key Columns
     Long Max Delta Size
   Contains:
       Private Iceberg Table: Delta
       Private Iceberg Table: Base

   On All Scans -
       Return a view which joins delta and base on primary key, if Delta
has a record for a given primary key discard the base record

  On All Writes -
       Perform all writes against the delta table, only MERGE is allowed.
Append is forbidden (No PK Guarantees) Only position deletes are allowed.

   On Delta Table Size Max Delta Size- -
       Upsert DELTA into BASE
       Clear upserted records from Delta


If the Delta Table size is kept small I think this would be almost as
performant as Equality deletes but still be compatible with row-lineage and
other indexing features.


On Tue, Nov 19, 2024 at 7:12 AM Manu Zhang <owenzhang1...@gmail.com> wrote:

> Hi Ajantha,
>
> I'm proposing exploring a view-based approach similar to the
> changelog-mirror table pattern[1] rather than supporting delta writers for
> Kafka connect Iceberg sink.
>
> 1.
> https://www.tabular.io/apache-iceberg-cookbook/data-engineering-cdc-table-mirroring/
>
> On Tue, Nov 19, 2024 at 7:38 PM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> I don’t think it’s a problem while an alternative is explored (the JDK
>> itself does that pretty often).
>> So it’s up to the community: of course I’m against removing it without
>> solid alternative, but deprecation is fine imho.
>>
>> Regards
>> JB
>>
>> Le mar. 19 nov. 2024 à 12:19, Ajantha Bhat <ajanthab...@gmail.com> a
>> écrit :
>>
>>> - ok for deprecate equality deletes
>>>> - not ok to remove it
>>>
>>>
>>> @JB: I don't think it is a good idea to use deprecated functionality in
>>> the new feature development.
>>> Hence, my specific question was about kafka connect upsert operation.
>>>
>>> @Manu: I meant the delta writers for kafka connect Iceberg sink (which
>>> in turn used for upsetting the CDC records)
>>> https://github.com/apache/iceberg/issues/10842
>>>
>>>
>>> - Ajantha
>>>
>>>
>>>
>>> On Tue, Nov 19, 2024 at 3:08 PM Manu Zhang <owenzhang1...@gmail.com>
>>> wrote:
>>>
>>>> I second Anton's proposal to standardize on a view-based approach to
>>>> handle CDC cases.
>>>> Actually, it's already been explored in detail[1] by Jack before.
>>>>
>>>> [1] Improving Change Data Capture Use Case for Apache Iceberg
>>>> <https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit?tab=t.0#heading=h.94xnx4qg3bnt>
>>>>
>>>>
>>>> On Tue, Nov 19, 2024 at 4:16 PM Jean-Baptiste Onofré <j...@nanthrax.net>
>>>> wrote:
>>>>
>>>>> My proposal is the following (already expressed):
>>>>> - ok for deprecate equality deletes
>>>>> - not ok to remove it
>>>>> - work on position deletes improvements to address streaming use
>>>>> cases. I think we should explore different approaches. Personally I think 
>>>>> a
>>>>> possible approach would be to find index way to data files to avoid full
>>>>> scan to find row position.
>>>>>
>>>>> My $0.01 :)
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> Le mar. 19 nov. 2024 à 07:53, Ajantha Bhat <ajanthab...@gmail.com> a
>>>>> écrit :
>>>>>
>>>>>> Hi, What's the conclusion on this thread?
>>>>>>
>>>>>> Users are looking for Upsert (CDC) support for OSS Iceberg
>>>>>> kafka connect sink.
>>>>>> We only support appends at the moment. Can we go ahead and implement
>>>>>> the upserts using equality deletes?
>>>>>>
>>>>>>
>>>>>> - Ajantha
>>>>>>
>>>>>> On Sun, Nov 10, 2024 at 11:56 AM Vignesh <vignesh.v...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> I am reading about iceberg and am quite new to this.
>>>>>>> This puffin would be an index from key to data file. Other use cases
>>>>>>> of Puffin, such as statistics are at a per file level if I understand
>>>>>>> correctly.
>>>>>>>
>>>>>>> Where would the puffin about key->data file be stored? It is a
>>>>>>> property of the entire table.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vignesh.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Nov 9, 2024 at 2:17 AM Shani Elharrar
>>>>>>> <sh...@upsolver.com.invalid> wrote:
>>>>>>>
>>>>>>>> JB, this is what we do, we write Equality Deletes and periodically
>>>>>>>> convert them to Positional Deletes.
>>>>>>>>
>>>>>>>> We could probably index the keys, maybe partially index using bloom
>>>>>>>> filters, the best would be to put those bloom filters inside puffin.
>>>>>>>>
>>>>>>>> Shani.
>>>>>>>>
>>>>>>>> On 9 Nov 2024, at 11:11, Jean-Baptiste Onofré <j...@nanthrax.net>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I agree with Peter here, and I would say that it would be an issue
>>>>>>>> for multi-engine support.
>>>>>>>>
>>>>>>>> I think, as I already mentioned with others, we should explore an
>>>>>>>> alternative.
>>>>>>>> As the main issue is the datafile scan in streaming context, maybe
>>>>>>>> we could find a way to "index"/correlate for positional deletes with
>>>>>>>> limited scanning.
>>>>>>>> I will think again about that :)
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> JB
>>>>>>>>
>>>>>>>> On Sat, Nov 9, 2024 at 6:48 AM Péter Váry <
>>>>>>>> peter.vary.apa...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Imran,
>>>>>>>>>
>>>>>>>>> I don't think it's a good idea to start creating multiple types of
>>>>>>>>> Iceberg tables. Iceberg's main selling point is compatibility between
>>>>>>>>> engines. If we don't have readers and writers for all types of 
>>>>>>>>> tables, then
>>>>>>>>> we remove compatibility from the equation and engine specific formats
>>>>>>>>> always win. OTOH, if we write readers and writers for all types of 
>>>>>>>>> tables
>>>>>>>>> then we are back on square one.
>>>>>>>>>
>>>>>>>>> Identifier fields are a table schema concept and used in many
>>>>>>>>> cases during query planning and execution. This is why they are 
>>>>>>>>> defined as
>>>>>>>>> part of the SQL spec, and this is why Iceberg defines them as well. 
>>>>>>>>> One use
>>>>>>>>> case is where they can be used to merge deletes (independently of how 
>>>>>>>>> they
>>>>>>>>> are manifested) and subsequent inserts, into updates.
>>>>>>>>>
>>>>>>>>> Flink SQL doesn't allow creating tables with partition transforms,
>>>>>>>>> so no new table could be created by Flink SQL using transforms, but 
>>>>>>>>> tables
>>>>>>>>> created by other engines could still be used (both read an write). 
>>>>>>>>> Also you
>>>>>>>>> can create such tables in Flink using the Java API.
>>>>>>>>>
>>>>>>>>> Requiring partition columns be part of the identifier fields is
>>>>>>>>> coming from the practical consideration, that you want to limit the 
>>>>>>>>> scope
>>>>>>>>> of the equality deletes as much as possible. Otherwise all of the 
>>>>>>>>> equality
>>>>>>>>> deletes should be table global, and they should be read by every 
>>>>>>>>> reader. We
>>>>>>>>> could write those, we just decided that we don't want to allow the 
>>>>>>>>> user to
>>>>>>>>> do this, as it is most cases a bad idea.
>>>>>>>>>
>>>>>>>>> I hope this helps,
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>> On Fri, Nov 8, 2024, 22:01 Imran Rashid
>>>>>>>>> <iras...@cloudera.com.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> I'm not down in the weeds at all myself on implementation
>>>>>>>>>> details, so forgive me if I'm wrong about the details here.
>>>>>>>>>>
>>>>>>>>>> I can see all the viewpoints -- both that equality deletes enable
>>>>>>>>>> some use cases, but also make others far more difficult.  What 
>>>>>>>>>> surprised me
>>>>>>>>>> the most is that Iceberg does not provide a way to distinguish these 
>>>>>>>>>> two
>>>>>>>>>> table "types".
>>>>>>>>>>
>>>>>>>>>> At first, I thought the presence of an identifier-field (
>>>>>>>>>> https://iceberg.apache.org/spec/#identifier-field-ids) indicated
>>>>>>>>>> that the table was a target for equality deletes.  But, then it 
>>>>>>>>>> turns out
>>>>>>>>>> identifier-fields are also useful for changelog views even without 
>>>>>>>>>> equality
>>>>>>>>>> deletes -- IIUC, they show that a delete + insert should actually be
>>>>>>>>>> interpreted as an update in changelog view.
>>>>>>>>>>
>>>>>>>>>> To be perfectly honest, I'm confused about all of these details
>>>>>>>>>> -- from my read, the spec does not indicate this relationship between
>>>>>>>>>> identifier-fields and equality_ids in equality delete files (
>>>>>>>>>> https://iceberg.apache.org/spec/#equality-delete-files), but I
>>>>>>>>>> think that is the way Flink works.  Flink itself seems to have even 
>>>>>>>>>> more
>>>>>>>>>> limitations -- no partition transforms are allowed, and all partition
>>>>>>>>>> columns must be a subset of the identifier fields.  Is that just a 
>>>>>>>>>> Flink
>>>>>>>>>> limitation, or is that the intended behavior in the spec?  (Or maybe
>>>>>>>>>> user-error on my part?)  Those seem like very reasonable 
>>>>>>>>>> limitations, from
>>>>>>>>>> an implementation point-of-view.  But OTOH, as a user, this seems to 
>>>>>>>>>> be
>>>>>>>>>> directly contrary to some of the promises of Iceberg.
>>>>>>>>>>
>>>>>>>>>> Its easy to see if a table already has equality deletes in it, by
>>>>>>>>>> looking at the metadata.  But is there any way to indicate that a 
>>>>>>>>>> table (or
>>>>>>>>>> branch of a table) _must not_ have equality deletes added to it?
>>>>>>>>>>
>>>>>>>>>> If that were possible, it seems like we could support both use
>>>>>>>>>> cases.  We could continue to optimize for the streaming ingestion 
>>>>>>>>>> use cases
>>>>>>>>>> using equality deletes.  But we could also build more optimizations 
>>>>>>>>>> into
>>>>>>>>>> the "non-streaming-ingestion" branches.  And we could document the 
>>>>>>>>>> tradeoff
>>>>>>>>>> so it is much clearer to end users.
>>>>>>>>>>
>>>>>>>>>> To maintain compatibility, I suppose that the change would be
>>>>>>>>>> that equality deletes continue to be allowed by default, but we'd 
>>>>>>>>>> add a new
>>>>>>>>>> field to indicate that for some tables (or branches of a table), 
>>>>>>>>>> equality
>>>>>>>>>> deletes would not be allowed.  And it would be an error for an 
>>>>>>>>>> engine to
>>>>>>>>>> make an update which added an equality delete to such a table.
>>>>>>>>>>
>>>>>>>>>> Maybe that change would even be possible in V3.
>>>>>>>>>>
>>>>>>>>>> And if all the performance improvements to equality deletes make
>>>>>>>>>> this a moot point, we could drop the field in v4.  But it seems like 
>>>>>>>>>> a
>>>>>>>>>> mistake to both limit the non-streaming use-case AND have confusing
>>>>>>>>>> limitations for the end-user in the meantime.
>>>>>>>>>>
>>>>>>>>>> I would happily be corrected about my understanding of all of the
>>>>>>>>>> above.
>>>>>>>>>>
>>>>>>>>>> thanks!
>>>>>>>>>> Imran
>>>>>>>>>>
>>>>>>>>>> On Tue, Nov 5, 2024 at 9:16 AM Bryan Keller <brya...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I also feel we should keep equality deletes until we have an
>>>>>>>>>>> alternative solution for streaming updates/deletes.
>>>>>>>>>>>
>>>>>>>>>>> -Bryan
>>>>>>>>>>>
>>>>>>>>>>> On Nov 4, 2024, at 8:33 AM, Péter Váry <
>>>>>>>>>>> peter.vary.apa...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Well, it seems like I'm a little late, so most of the arguments
>>>>>>>>>>> are voiced.
>>>>>>>>>>>
>>>>>>>>>>> I agree that we should not deprecate the equality deletes until
>>>>>>>>>>> we have a replacement feature.
>>>>>>>>>>> I think one of the big advantages of Iceberg is that it supports
>>>>>>>>>>> batch processing and streaming ingestion too.
>>>>>>>>>>> For streaming ingestion we need a way to update existing data in
>>>>>>>>>>> a performant way, but restricting deletes for the primary keys 
>>>>>>>>>>> seems like
>>>>>>>>>>> enough from the streaming perspective.
>>>>>>>>>>>
>>>>>>>>>>> Equality deletes allow a very wide range of applications, which
>>>>>>>>>>> we might be able to narrow down a bit, but still keep useful. So if 
>>>>>>>>>>> we want
>>>>>>>>>>> to go down this road, we need to start collecting the requirements.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Peter
>>>>>>>>>>>
>>>>>>>>>>> Shani Elharrar <sh...@upsolver.com.invalid> ezt írta (időpont:
>>>>>>>>>>> 2024. nov. 1., P, 19:22):
>>>>>>>>>>>
>>>>>>>>>>>> I understand how it makes sense for batch jobs, but it damages
>>>>>>>>>>>> stream jobs, using equality deletes works much better for 
>>>>>>>>>>>> streaming (which
>>>>>>>>>>>> have a strict SLA for delays), and in order to decrease the 
>>>>>>>>>>>> performance
>>>>>>>>>>>> penalty - systems can rewrite the equality deletes to positional 
>>>>>>>>>>>> deletes.
>>>>>>>>>>>>
>>>>>>>>>>>> Shani.
>>>>>>>>>>>>
>>>>>>>>>>>> On 1 Nov 2024, at 20:06, Steven Wu <stevenz...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> 
>>>>>>>>>>>> Fundamentally, it is very difficult to write position deletes
>>>>>>>>>>>> with concurrent writers and conflicts for batch jobs too, as the 
>>>>>>>>>>>> inverted
>>>>>>>>>>>> index may become invalid/stale.
>>>>>>>>>>>>
>>>>>>>>>>>> The position deletes are created during the write phase. But
>>>>>>>>>>>> conflicts are only detected at the commit stage. I assume the 
>>>>>>>>>>>> batch job
>>>>>>>>>>>> should fail in this case.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Nov 1, 2024 at 10:57 AM Steven Wu <stevenz...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Shani,
>>>>>>>>>>>>>
>>>>>>>>>>>>> That is a good point. It is certainly a limitation for the
>>>>>>>>>>>>> Flink job to track the inverted index internally (which is what I 
>>>>>>>>>>>>> had in
>>>>>>>>>>>>> mind). It can't be shared/synchronized with other Flink jobs or 
>>>>>>>>>>>>> other
>>>>>>>>>>>>> engines writing to the same table.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar
>>>>>>>>>>>>> <sh...@upsolver.com.invalid> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Even if Flink can create this state, it would have to be
>>>>>>>>>>>>>> maintained against the Iceberg table, we wouldn't like 
>>>>>>>>>>>>>> duplicates (keys) if
>>>>>>>>>>>>>> other systems / users update the table (e.g manual insert / 
>>>>>>>>>>>>>> updates using
>>>>>>>>>>>>>> DML).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Shani.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> > Add support for inverted indexes to reduce the cost of
>>>>>>>>>>>>>> position lookup. This is fairly tricky to implement for 
>>>>>>>>>>>>>> streaming use cases
>>>>>>>>>>>>>> without an external system.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Anton, that is also what I was saying earlier. In Flink, the
>>>>>>>>>>>>>> inverted index of (key, committed data files) can be tracked in 
>>>>>>>>>>>>>> Flink state.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi <
>>>>>>>>>>>>>> aokolnyc...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I was a bit skeptical when we were adding equality deletes,
>>>>>>>>>>>>>>> but nothing beats their performance during writes. We have to 
>>>>>>>>>>>>>>> find an
>>>>>>>>>>>>>>> alternative before deprecating.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We are doing a lot of work to improve streaming, like
>>>>>>>>>>>>>>> reducing the cost of commits, enabling a large (potentially 
>>>>>>>>>>>>>>> infinite)
>>>>>>>>>>>>>>> number of snapshots, changelog reads, and so on. It is a 
>>>>>>>>>>>>>>> project goal to
>>>>>>>>>>>>>>> excel in streaming.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I was going to focus on equality deletes after completing
>>>>>>>>>>>>>>> the DV work. I believe we have these options:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - Revisit the existing design of equality deletes (e.g. add
>>>>>>>>>>>>>>> more restrictions, improve compaction, offer new writers).
>>>>>>>>>>>>>>> - Standardize on the view-based approach [1] to handle
>>>>>>>>>>>>>>> streaming upserts and CDC use cases, potentially making this 
>>>>>>>>>>>>>>> part of the
>>>>>>>>>>>>>>> spec.
>>>>>>>>>>>>>>> - Add support for inverted indexes to reduce the cost of
>>>>>>>>>>>>>>> position lookup. This is fairly tricky to implement for 
>>>>>>>>>>>>>>> streaming use cases
>>>>>>>>>>>>>>> without an external system. Our runtime filtering in Spark 
>>>>>>>>>>>>>>> today is
>>>>>>>>>>>>>>> equivalent to looking up positions in an inverted index 
>>>>>>>>>>>>>>> represented by
>>>>>>>>>>>>>>> another Iceberg table. That may still not be enough for some 
>>>>>>>>>>>>>>> streaming use
>>>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - Anton
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield <
>>>>>>>>>>>>>>> emkornfi...@gmail.com> пише:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I agree that equality deletes have their place in
>>>>>>>>>>>>>>>> streaming.  I think the ultimate decision here is how 
>>>>>>>>>>>>>>>> opinionated
>>>>>>>>>>>>>>>> Iceberg wants to be on its use-cases.  If it really wants to 
>>>>>>>>>>>>>>>> stick to its
>>>>>>>>>>>>>>>> origins of "slow moving data", then removing equality deletes 
>>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>> inline with this.  I think the other high level question is 
>>>>>>>>>>>>>>>> how much we
>>>>>>>>>>>>>>>> allow for partially compatible features (the row lineage 
>>>>>>>>>>>>>>>> use-case feature
>>>>>>>>>>>>>>>> was explicitly approved excluding equality deletes, and people 
>>>>>>>>>>>>>>>> seemed OK
>>>>>>>>>>>>>>>> with it at the time.  If all features need to work together, 
>>>>>>>>>>>>>>>> then maybe we
>>>>>>>>>>>>>>>> need to rethink the design here so it can be forward 
>>>>>>>>>>>>>>>> compatible with
>>>>>>>>>>>>>>>> equality deletes).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think one issue with equality deletes as stated in the
>>>>>>>>>>>>>>>> spec is that they are overly broad.  I'd be interested if 
>>>>>>>>>>>>>>>> people have any
>>>>>>>>>>>>>>>> use cases that differ, but I think one way of narrowing (and 
>>>>>>>>>>>>>>>> probably a
>>>>>>>>>>>>>>>> necessary building block for building something better)  the 
>>>>>>>>>>>>>>>> specification
>>>>>>>>>>>>>>>> scope on equality deletes is to focus on upsert/Streaming 
>>>>>>>>>>>>>>>> deletes.  Two
>>>>>>>>>>>>>>>> proposals in this regard are:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1.  Require that equality deletes can only correspond to
>>>>>>>>>>>>>>>> unique identifiers for the table.
>>>>>>>>>>>>>>>> 2.  Consider requiring that for equality deletes on
>>>>>>>>>>>>>>>> partitioned tables, that the primary key must contain a 
>>>>>>>>>>>>>>>> partition column (I
>>>>>>>>>>>>>>>> believe Flink at least already does this).  It is less clear 
>>>>>>>>>>>>>>>> to me that
>>>>>>>>>>>>>>>> this would meet all existing use-cases.  But having this would 
>>>>>>>>>>>>>>>> allow for
>>>>>>>>>>>>>>>> better incremental data-structures, which could then be 
>>>>>>>>>>>>>>>> partition based.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Narrow scope to unique identifiers would allow for further
>>>>>>>>>>>>>>>> building blocks already mentioned, like a secondary index 
>>>>>>>>>>>>>>>> (possible via LSM
>>>>>>>>>>>>>>>> tree), that would allow for better performance overall.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I generally agree with the sentiment that we shouldn't
>>>>>>>>>>>>>>>> deprecate them until there is a viable replacement.  With all 
>>>>>>>>>>>>>>>> due respect
>>>>>>>>>>>>>>>> to my employer, let's not fall into the Google trap [1] :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>> Micah
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1] https://goomics.net/50/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo <
>>>>>>>>>>>>>>>> alex...@starburstdata.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hey all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Just to throw my 2 cents in, I agree with Steven and
>>>>>>>>>>>>>>>>> others that we do need some kind of replacement before 
>>>>>>>>>>>>>>>>> deprecating equality
>>>>>>>>>>>>>>>>> deletes.
>>>>>>>>>>>>>>>>> They certainly have their problems, and do significantly
>>>>>>>>>>>>>>>>> increase complexity as they are now, but the writing of 
>>>>>>>>>>>>>>>>> position deletes is
>>>>>>>>>>>>>>>>> too expensive for certain pipelines.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We've been investigating using equality deletes for some
>>>>>>>>>>>>>>>>> of our workloads at Starburst, the key advantage we were 
>>>>>>>>>>>>>>>>> hoping to leverage
>>>>>>>>>>>>>>>>> is cheap, effectively random access lookup deletes.
>>>>>>>>>>>>>>>>> Say you have a UUID column that's unique in a table and
>>>>>>>>>>>>>>>>> want to delete a row by UUID. With position deletes each 
>>>>>>>>>>>>>>>>> delete is
>>>>>>>>>>>>>>>>> expensive without an index on that UUID.
>>>>>>>>>>>>>>>>> With equality deletes each delete is cheap and while
>>>>>>>>>>>>>>>>> reads/compaction is expensive but when updates are frequent 
>>>>>>>>>>>>>>>>> and reads are
>>>>>>>>>>>>>>>>> sporadic that's a reasonable tradeoff.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Pretty much what Jason and Steven have already said.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Maybe there are some incremental improvements on equality
>>>>>>>>>>>>>>>>> deletes or tips from similar systems that might alleviate 
>>>>>>>>>>>>>>>>> some of their
>>>>>>>>>>>>>>>>> problems?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - Alex Jo
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu <
>>>>>>>>>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We probably all agree with the downside of equality
>>>>>>>>>>>>>>>>>> deletes: it postpones all the work on the read path.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In theory, we can implement position deletes only in the
>>>>>>>>>>>>>>>>>> Flink streaming writer. It would require the tracking of 
>>>>>>>>>>>>>>>>>> last committed
>>>>>>>>>>>>>>>>>> data files per key, which can be stored in Flink state 
>>>>>>>>>>>>>>>>>> (checkpointed). This
>>>>>>>>>>>>>>>>>> is obviously quite expensive/challenging, but possible.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I like to echo one benefit of equality deletes that
>>>>>>>>>>>>>>>>>> Russel called out in the original email. Equality deletes 
>>>>>>>>>>>>>>>>>> would never
>>>>>>>>>>>>>>>>>> have conflicts. that is important for streaming writers 
>>>>>>>>>>>>>>>>>> (Flink, Kafka
>>>>>>>>>>>>>>>>>> connect, ...) that commit frequently (minutes or less). 
>>>>>>>>>>>>>>>>>> Assume Flink can
>>>>>>>>>>>>>>>>>> write position deletes only and commit every 2 minutes. The 
>>>>>>>>>>>>>>>>>> long-running
>>>>>>>>>>>>>>>>>> nature of streaming jobs can cause frequent commit conflicts 
>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> background delete compaction jobs.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Overall, the streaming upsert write is not a well solved
>>>>>>>>>>>>>>>>>> problem in Iceberg. This probably affects all streaming 
>>>>>>>>>>>>>>>>>> engines (Flink,
>>>>>>>>>>>>>>>>>> Kafka connect, Spark streaming, ...). We need to come up 
>>>>>>>>>>>>>>>>>> with some better
>>>>>>>>>>>>>>>>>> alternatives before we can deprecate equality deletes.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer <
>>>>>>>>>>>>>>>>>> russell.spit...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> For users of Equality Deletes, what are the key
>>>>>>>>>>>>>>>>>>> benefits to Equality Deletes that you would like to 
>>>>>>>>>>>>>>>>>>> preserve and could you
>>>>>>>>>>>>>>>>>>> please share some concrete examples of the queries you want 
>>>>>>>>>>>>>>>>>>> to run (and the
>>>>>>>>>>>>>>>>>>> schemas and data sizes you would like to run them against) 
>>>>>>>>>>>>>>>>>>> and the
>>>>>>>>>>>>>>>>>>> latencies that would be acceptable?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine
>>>>>>>>>>>>>>>>>>> <ja...@upsolver.com.invalid> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Representing Upsolver here, we also make use of
>>>>>>>>>>>>>>>>>>>> Equality Deletes to deliver high frequency low latency 
>>>>>>>>>>>>>>>>>>>> updates to our
>>>>>>>>>>>>>>>>>>>> clients at scale. We have customers using them at scale 
>>>>>>>>>>>>>>>>>>>> and demonstrating
>>>>>>>>>>>>>>>>>>>> the need and viability. We automate the process of 
>>>>>>>>>>>>>>>>>>>> converting them into
>>>>>>>>>>>>>>>>>>>> positional deletes (or fully applying them) for more 
>>>>>>>>>>>>>>>>>>>> efficient engine
>>>>>>>>>>>>>>>>>>>> queries in the background giving our users both low 
>>>>>>>>>>>>>>>>>>>> latency and good query
>>>>>>>>>>>>>>>>>>>> performance.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Equality Deletes were added since there isn't a good
>>>>>>>>>>>>>>>>>>>> way to solve frequent updates otherwise. It would require 
>>>>>>>>>>>>>>>>>>>> some sort of
>>>>>>>>>>>>>>>>>>>> index keeping track of every record in the table (by a 
>>>>>>>>>>>>>>>>>>>> predetermined PK)
>>>>>>>>>>>>>>>>>>>> and maintaining such an index is a huge task that every 
>>>>>>>>>>>>>>>>>>>> tool interested in
>>>>>>>>>>>>>>>>>>>> this would need to re-implement. It also becomes a 
>>>>>>>>>>>>>>>>>>>> bottleneck limiting
>>>>>>>>>>>>>>>>>>>> table sizes.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I don't think they should be removed without providing
>>>>>>>>>>>>>>>>>>>> an alternative. Positional Deletes have a different 
>>>>>>>>>>>>>>>>>>>> performance profile
>>>>>>>>>>>>>>>>>>>> inherently, requiring more upfront work proportional to 
>>>>>>>>>>>>>>>>>>>> the table size.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré <
>>>>>>>>>>>>>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi Russell
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks for the nice writeup and the proposal.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I agree with your analysis, and I have the same
>>>>>>>>>>>>>>>>>>>>> feeling. However, I
>>>>>>>>>>>>>>>>>>>>> think there are more than Flink that write equality
>>>>>>>>>>>>>>>>>>>>> delete files. So,
>>>>>>>>>>>>>>>>>>>>> I agree to deprecate in V3, but maybe be more
>>>>>>>>>>>>>>>>>>>>> "flexible" about removal
>>>>>>>>>>>>>>>>>>>>> in V4 in order to give time to engines to update.
>>>>>>>>>>>>>>>>>>>>> I think that by deprecating equality deletes, we are
>>>>>>>>>>>>>>>>>>>>> clearly focusing
>>>>>>>>>>>>>>>>>>>>> on read performance and "consistency" (more than
>>>>>>>>>>>>>>>>>>>>> write). It's not
>>>>>>>>>>>>>>>>>>>>> necessarily a bad thing but the streaming platform and
>>>>>>>>>>>>>>>>>>>>> data ingestion
>>>>>>>>>>>>>>>>>>>>> platforms will be probably concerned about that (by
>>>>>>>>>>>>>>>>>>>>> using positional
>>>>>>>>>>>>>>>>>>>>> deletes, they will have to scan/read all datafiles to
>>>>>>>>>>>>>>>>>>>>> find the
>>>>>>>>>>>>>>>>>>>>> position, so painful).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> So, to summarize:
>>>>>>>>>>>>>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to
>>>>>>>>>>>>>>>>>>>>> commit any target
>>>>>>>>>>>>>>>>>>>>> for deletion before having a clear path for streaming
>>>>>>>>>>>>>>>>>>>>> platforms
>>>>>>>>>>>>>>>>>>>>> (Flink, Beam, ...)
>>>>>>>>>>>>>>>>>>>>> 2. In the meantime (during the deprecation period), I
>>>>>>>>>>>>>>>>>>>>> propose to
>>>>>>>>>>>>>>>>>>>>> explore possible improvements for streaming platforms
>>>>>>>>>>>>>>>>>>>>> (maybe finding a
>>>>>>>>>>>>>>>>>>>>> way to avoid full data files scan, ...)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks !
>>>>>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer
>>>>>>>>>>>>>>>>>>>>> <russell.spit...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > Background:
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > 1) Position Deletes
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > Writers determine what rows are deleted and mark
>>>>>>>>>>>>>>>>>>>>> them in a 1 for 1 representation. With delete vectors 
>>>>>>>>>>>>>>>>>>>>> this means every data
>>>>>>>>>>>>>>>>>>>>> file has at most 1 delete vector that it is read in 
>>>>>>>>>>>>>>>>>>>>> conjunction with to
>>>>>>>>>>>>>>>>>>>>> excise deleted rows. Reader overhead is more or less 
>>>>>>>>>>>>>>>>>>>>> constant and is very
>>>>>>>>>>>>>>>>>>>>> predictable.
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > The main cost of this mode is that deletes must be
>>>>>>>>>>>>>>>>>>>>> determined at write time which is expensive and can be 
>>>>>>>>>>>>>>>>>>>>> more difficult for
>>>>>>>>>>>>>>>>>>>>> conflict resolution
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > 2) Equality Deletes
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > Writers write out reference to what values are
>>>>>>>>>>>>>>>>>>>>> deleted (in a partition or globally). There can be an 
>>>>>>>>>>>>>>>>>>>>> unlimited number of
>>>>>>>>>>>>>>>>>>>>> equality deletes and they all must be checked for every 
>>>>>>>>>>>>>>>>>>>>> data file that is
>>>>>>>>>>>>>>>>>>>>> read. The cost of determining deleted rows is essentially 
>>>>>>>>>>>>>>>>>>>>> given to the
>>>>>>>>>>>>>>>>>>>>> reader.
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > Conflicts almost never happen since data files are
>>>>>>>>>>>>>>>>>>>>> not actually changed and there is almost no cost to the 
>>>>>>>>>>>>>>>>>>>>> writer to generate
>>>>>>>>>>>>>>>>>>>>> these. Almost all costs related to equality deletes are 
>>>>>>>>>>>>>>>>>>>>> passed on to the
>>>>>>>>>>>>>>>>>>>>> reader.
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > Proposal:
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > Equality deletes are, in my opinion, unsustainable
>>>>>>>>>>>>>>>>>>>>> and we should work on deprecating and removing them from 
>>>>>>>>>>>>>>>>>>>>> the specification.
>>>>>>>>>>>>>>>>>>>>> At this time, I know of only one engine (Apache Flink) 
>>>>>>>>>>>>>>>>>>>>> which produces these
>>>>>>>>>>>>>>>>>>>>> deletes but almost all engines have implementations to 
>>>>>>>>>>>>>>>>>>>>> read them. The cost
>>>>>>>>>>>>>>>>>>>>> of implementing equality deletes on the read path is 
>>>>>>>>>>>>>>>>>>>>> difficult and
>>>>>>>>>>>>>>>>>>>>> unpredictable in terms of memory usage and compute 
>>>>>>>>>>>>>>>>>>>>> complexity. We’ve had
>>>>>>>>>>>>>>>>>>>>> suggestions of implementing rocksdb inorder to handle 
>>>>>>>>>>>>>>>>>>>>> ever growing sets of
>>>>>>>>>>>>>>>>>>>>> equality deletes which in my opinion shows that we are 
>>>>>>>>>>>>>>>>>>>>> going down the wrong
>>>>>>>>>>>>>>>>>>>>> path.
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > Outside of performance, Equality deletes are also
>>>>>>>>>>>>>>>>>>>>> difficult to use in conjunction with many other features. 
>>>>>>>>>>>>>>>>>>>>> For example, any
>>>>>>>>>>>>>>>>>>>>> features requiring CDC or Row lineage are basically 
>>>>>>>>>>>>>>>>>>>>> impossible when
>>>>>>>>>>>>>>>>>>>>> equality deletes are in use. When Equality deletes are 
>>>>>>>>>>>>>>>>>>>>> present, the state
>>>>>>>>>>>>>>>>>>>>> of the table can only be determined with a full scan 
>>>>>>>>>>>>>>>>>>>>> making it difficult to
>>>>>>>>>>>>>>>>>>>>> update differential structures. This means materialized 
>>>>>>>>>>>>>>>>>>>>> views or indexes
>>>>>>>>>>>>>>>>>>>>> need to essentially be fully rebuilt whenever an equality 
>>>>>>>>>>>>>>>>>>>>> delete is added
>>>>>>>>>>>>>>>>>>>>> to the table.
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > Equality deletes essentially remove complexity from
>>>>>>>>>>>>>>>>>>>>> the write side but then add what I believe is an 
>>>>>>>>>>>>>>>>>>>>> unacceptable level of
>>>>>>>>>>>>>>>>>>>>> complexity to the read side.
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > Because of this I suggest we deprecate Equality
>>>>>>>>>>>>>>>>>>>>> Deletes in V3 and slate them for full removal from the 
>>>>>>>>>>>>>>>>>>>>> Iceberg Spec in V4.
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > I know this is a big change and compatibility
>>>>>>>>>>>>>>>>>>>>> breakage so I would like to introduce this idea to the 
>>>>>>>>>>>>>>>>>>>>> community and
>>>>>>>>>>>>>>>>>>>>> solicit feedback from all stakeholders. I am very 
>>>>>>>>>>>>>>>>>>>>> flexible on this issue
>>>>>>>>>>>>>>>>>>>>> and would like to hear the best issues both for and 
>>>>>>>>>>>>>>>>>>>>> against removal of
>>>>>>>>>>>>>>>>>>>>> Equality Deletes.
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > Thanks everyone for your time,
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > Russ Spitzer
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> *Jason Fine*
>>>>>>>>>>>>>>>>>>>> Chief Software Architect
>>>>>>>>>>>>>>>>>>>> ja...@upsolver.com  | www.upsolver.com
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>

Reply via email to