My proposal is the following (already expressed): - ok for deprecate equality deletes - not ok to remove it - work on position deletes improvements to address streaming use cases. I think we should explore different approaches. Personally I think a possible approach would be to find index way to data files to avoid full scan to find row position.
My $0.01 :) Regards JB Le mar. 19 nov. 2024 à 07:53, Ajantha Bhat <ajanthab...@gmail.com> a écrit : > Hi, What's the conclusion on this thread? > > Users are looking for Upsert (CDC) support for OSS Iceberg kafka connect > sink. > We only support appends at the moment. Can we go ahead and implement the > upserts using equality deletes? > > > - Ajantha > > On Sun, Nov 10, 2024 at 11:56 AM Vignesh <vignesh.v...@gmail.com> wrote: > >> Hi, >> I am reading about iceberg and am quite new to this. >> This puffin would be an index from key to data file. Other use cases of >> Puffin, such as statistics are at a per file level if I understand >> correctly. >> >> Where would the puffin about key->data file be stored? It is a property >> of the entire table. >> >> Thanks, >> Vignesh. >> >> >> On Sat, Nov 9, 2024 at 2:17 AM Shani Elharrar <sh...@upsolver.com.invalid> >> wrote: >> >>> JB, this is what we do, we write Equality Deletes and periodically >>> convert them to Positional Deletes. >>> >>> We could probably index the keys, maybe partially index using bloom >>> filters, the best would be to put those bloom filters inside puffin. >>> >>> Shani. >>> >>> On 9 Nov 2024, at 11:11, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: >>> >>> >>> Hi, >>> >>> I agree with Peter here, and I would say that it would be an issue for >>> multi-engine support. >>> >>> I think, as I already mentioned with others, we should explore an >>> alternative. >>> As the main issue is the datafile scan in streaming context, maybe we >>> could find a way to "index"/correlate for positional deletes with limited >>> scanning. >>> I will think again about that :) >>> >>> Regards >>> JB >>> >>> On Sat, Nov 9, 2024 at 6:48 AM Péter Váry <peter.vary.apa...@gmail.com> >>> wrote: >>> >>>> Hi Imran, >>>> >>>> I don't think it's a good idea to start creating multiple types of >>>> Iceberg tables. Iceberg's main selling point is compatibility between >>>> engines. If we don't have readers and writers for all types of tables, then >>>> we remove compatibility from the equation and engine specific formats >>>> always win. OTOH, if we write readers and writers for all types of tables >>>> then we are back on square one. >>>> >>>> Identifier fields are a table schema concept and used in many cases >>>> during query planning and execution. This is why they are defined as part >>>> of the SQL spec, and this is why Iceberg defines them as well. One use case >>>> is where they can be used to merge deletes (independently of how they are >>>> manifested) and subsequent inserts, into updates. >>>> >>>> Flink SQL doesn't allow creating tables with partition transforms, so >>>> no new table could be created by Flink SQL using transforms, but tables >>>> created by other engines could still be used (both read an write). Also you >>>> can create such tables in Flink using the Java API. >>>> >>>> Requiring partition columns be part of the identifier fields is coming >>>> from the practical consideration, that you want to limit the scope of the >>>> equality deletes as much as possible. Otherwise all of the equality deletes >>>> should be table global, and they should be read by every reader. We could >>>> write those, we just decided that we don't want to allow the user to do >>>> this, as it is most cases a bad idea. >>>> >>>> I hope this helps, >>>> Peter >>>> >>>> On Fri, Nov 8, 2024, 22:01 Imran Rashid <iras...@cloudera.com.invalid> >>>> wrote: >>>> >>>>> I'm not down in the weeds at all myself on implementation details, so >>>>> forgive me if I'm wrong about the details here. >>>>> >>>>> I can see all the viewpoints -- both that equality deletes enable some >>>>> use cases, but also make others far more difficult. What surprised me the >>>>> most is that Iceberg does not provide a way to distinguish these two table >>>>> "types". >>>>> >>>>> At first, I thought the presence of an identifier-field ( >>>>> https://iceberg.apache.org/spec/#identifier-field-ids) indicated that >>>>> the table was a target for equality deletes. But, then it turns out >>>>> identifier-fields are also useful for changelog views even without >>>>> equality >>>>> deletes -- IIUC, they show that a delete + insert should actually be >>>>> interpreted as an update in changelog view. >>>>> >>>>> To be perfectly honest, I'm confused about all of these details -- >>>>> from my read, the spec does not indicate this relationship between >>>>> identifier-fields and equality_ids in equality delete files ( >>>>> https://iceberg.apache.org/spec/#equality-delete-files), but I think >>>>> that is the way Flink works. Flink itself seems to have even more >>>>> limitations -- no partition transforms are allowed, and all partition >>>>> columns must be a subset of the identifier fields. Is that just a Flink >>>>> limitation, or is that the intended behavior in the spec? (Or maybe >>>>> user-error on my part?) Those seem like very reasonable limitations, from >>>>> an implementation point-of-view. But OTOH, as a user, this seems to be >>>>> directly contrary to some of the promises of Iceberg. >>>>> >>>>> Its easy to see if a table already has equality deletes in it, by >>>>> looking at the metadata. But is there any way to indicate that a table >>>>> (or >>>>> branch of a table) _must not_ have equality deletes added to it? >>>>> >>>>> If that were possible, it seems like we could support both use cases. >>>>> We could continue to optimize for the streaming ingestion use cases using >>>>> equality deletes. But we could also build more optimizations into the >>>>> "non-streaming-ingestion" branches. And we could document the tradeoff so >>>>> it is much clearer to end users. >>>>> >>>>> To maintain compatibility, I suppose that the change would be that >>>>> equality deletes continue to be allowed by default, but we'd add a new >>>>> field to indicate that for some tables (or branches of a table), equality >>>>> deletes would not be allowed. And it would be an error for an engine to >>>>> make an update which added an equality delete to such a table. >>>>> >>>>> Maybe that change would even be possible in V3. >>>>> >>>>> And if all the performance improvements to equality deletes make this >>>>> a moot point, we could drop the field in v4. But it seems like a mistake >>>>> to both limit the non-streaming use-case AND have confusing limitations >>>>> for >>>>> the end-user in the meantime. >>>>> >>>>> I would happily be corrected about my understanding of all of the >>>>> above. >>>>> >>>>> thanks! >>>>> Imran >>>>> >>>>> On Tue, Nov 5, 2024 at 9:16 AM Bryan Keller <brya...@gmail.com> wrote: >>>>> >>>>>> I also feel we should keep equality deletes until we have an >>>>>> alternative solution for streaming updates/deletes. >>>>>> >>>>>> -Bryan >>>>>> >>>>>> On Nov 4, 2024, at 8:33 AM, Péter Váry <peter.vary.apa...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Well, it seems like I'm a little late, so most of the arguments are >>>>>> voiced. >>>>>> >>>>>> I agree that we should not deprecate the equality deletes until we >>>>>> have a replacement feature. >>>>>> I think one of the big advantages of Iceberg is that it supports >>>>>> batch processing and streaming ingestion too. >>>>>> For streaming ingestion we need a way to update existing data in a >>>>>> performant way, but restricting deletes for the primary keys seems like >>>>>> enough from the streaming perspective. >>>>>> >>>>>> Equality deletes allow a very wide range of applications, which we >>>>>> might be able to narrow down a bit, but still keep useful. So if we want >>>>>> to >>>>>> go down this road, we need to start collecting the requirements. >>>>>> >>>>>> Thanks, >>>>>> Peter >>>>>> >>>>>> Shani Elharrar <sh...@upsolver.com.invalid> ezt írta (időpont: 2024. >>>>>> nov. 1., P, 19:22): >>>>>> >>>>>>> I understand how it makes sense for batch jobs, but it damages >>>>>>> stream jobs, using equality deletes works much better for streaming >>>>>>> (which >>>>>>> have a strict SLA for delays), and in order to decrease the performance >>>>>>> penalty - systems can rewrite the equality deletes to positional >>>>>>> deletes. >>>>>>> >>>>>>> Shani. >>>>>>> >>>>>>> On 1 Nov 2024, at 20:06, Steven Wu <stevenz...@gmail.com> wrote: >>>>>>> >>>>>>> >>>>>>> Fundamentally, it is very difficult to write position deletes with >>>>>>> concurrent writers and conflicts for batch jobs too, as the inverted >>>>>>> index >>>>>>> may become invalid/stale. >>>>>>> >>>>>>> The position deletes are created during the write phase. But >>>>>>> conflicts are only detected at the commit stage. I assume the batch job >>>>>>> should fail in this case. >>>>>>> >>>>>>> On Fri, Nov 1, 2024 at 10:57 AM Steven Wu <stevenz...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Shani, >>>>>>>> >>>>>>>> That is a good point. It is certainly a limitation for the Flink >>>>>>>> job to track the inverted index internally (which is what I had in >>>>>>>> mind). >>>>>>>> It can't be shared/synchronized with other Flink jobs or other engines >>>>>>>> writing to the same table. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Steven >>>>>>>> >>>>>>>> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar >>>>>>>> <sh...@upsolver.com.invalid> wrote: >>>>>>>> >>>>>>>>> Even if Flink can create this state, it would have to be >>>>>>>>> maintained against the Iceberg table, we wouldn't like duplicates >>>>>>>>> (keys) if >>>>>>>>> other systems / users update the table (e.g manual insert / updates >>>>>>>>> using >>>>>>>>> DML). >>>>>>>>> >>>>>>>>> Shani. >>>>>>>>> >>>>>>>>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> > Add support for inverted indexes to reduce the cost of position >>>>>>>>> lookup. This is fairly tricky to implement for streaming use cases >>>>>>>>> without >>>>>>>>> an external system. >>>>>>>>> >>>>>>>>> Anton, that is also what I was saying earlier. In Flink, the >>>>>>>>> inverted index of (key, committed data files) can be tracked in Flink >>>>>>>>> state. >>>>>>>>> >>>>>>>>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi < >>>>>>>>> aokolnyc...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> I was a bit skeptical when we were adding equality deletes, but >>>>>>>>>> nothing beats their performance during writes. We have to find an >>>>>>>>>> alternative before deprecating. >>>>>>>>>> >>>>>>>>>> We are doing a lot of work to improve streaming, like reducing >>>>>>>>>> the cost of commits, enabling a large (potentially infinite) number >>>>>>>>>> of >>>>>>>>>> snapshots, changelog reads, and so on. It is a project goal to excel >>>>>>>>>> in >>>>>>>>>> streaming. >>>>>>>>>> >>>>>>>>>> I was going to focus on equality deletes after completing the DV >>>>>>>>>> work. I believe we have these options: >>>>>>>>>> >>>>>>>>>> - Revisit the existing design of equality deletes (e.g. add more >>>>>>>>>> restrictions, improve compaction, offer new writers). >>>>>>>>>> - Standardize on the view-based approach [1] to handle streaming >>>>>>>>>> upserts and CDC use cases, potentially making this part of the spec. >>>>>>>>>> - Add support for inverted indexes to reduce the cost of position >>>>>>>>>> lookup. This is fairly tricky to implement for streaming use cases >>>>>>>>>> without >>>>>>>>>> an external system. Our runtime filtering in Spark today is >>>>>>>>>> equivalent to >>>>>>>>>> looking up positions in an inverted index represented by another >>>>>>>>>> Iceberg >>>>>>>>>> table. That may still not be enough for some streaming use cases. >>>>>>>>>> >>>>>>>>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/ >>>>>>>>>> >>>>>>>>>> - Anton >>>>>>>>>> >>>>>>>>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield < >>>>>>>>>> emkornfi...@gmail.com> пише: >>>>>>>>>> >>>>>>>>>>> I agree that equality deletes have their place in streaming. I >>>>>>>>>>> think the ultimate decision here is how opinionated Iceberg wants >>>>>>>>>>> to be on >>>>>>>>>>> its use-cases. If it really wants to stick to its origins of "slow >>>>>>>>>>> moving >>>>>>>>>>> data", then removing equality deletes would be inline with this. I >>>>>>>>>>> think >>>>>>>>>>> the other high level question is how much we allow for partially >>>>>>>>>>> compatible >>>>>>>>>>> features (the row lineage use-case feature was explicitly approved >>>>>>>>>>> excluding equality deletes, and people seemed OK with it at the >>>>>>>>>>> time. If >>>>>>>>>>> all features need to work together, then maybe we need to rethink >>>>>>>>>>> the >>>>>>>>>>> design here so it can be forward compatible with equality deletes). >>>>>>>>>>> >>>>>>>>>>> I think one issue with equality deletes as stated in the spec is >>>>>>>>>>> that they are overly broad. I'd be interested if people have any >>>>>>>>>>> use cases >>>>>>>>>>> that differ, but I think one way of narrowing (and probably a >>>>>>>>>>> necessary >>>>>>>>>>> building block for building something better) the specification >>>>>>>>>>> scope on >>>>>>>>>>> equality deletes is to focus on upsert/Streaming deletes. Two >>>>>>>>>>> proposals in >>>>>>>>>>> this regard are: >>>>>>>>>>> >>>>>>>>>>> 1. Require that equality deletes can only correspond to unique >>>>>>>>>>> identifiers for the table. >>>>>>>>>>> 2. Consider requiring that for equality deletes on partitioned >>>>>>>>>>> tables, that the primary key must contain a partition column (I >>>>>>>>>>> believe >>>>>>>>>>> Flink at least already does this). It is less clear to me that >>>>>>>>>>> this would >>>>>>>>>>> meet all existing use-cases. But having this would allow for better >>>>>>>>>>> incremental data-structures, which could then be partition based. >>>>>>>>>>> >>>>>>>>>>> Narrow scope to unique identifiers would allow for further >>>>>>>>>>> building blocks already mentioned, like a secondary index (possible >>>>>>>>>>> via LSM >>>>>>>>>>> tree), that would allow for better performance overall. >>>>>>>>>>> >>>>>>>>>>> I generally agree with the sentiment that we shouldn't deprecate >>>>>>>>>>> them until there is a viable replacement. With all due respect to >>>>>>>>>>> my >>>>>>>>>>> employer, let's not fall into the Google trap [1] :) >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Micah >>>>>>>>>>> >>>>>>>>>>> [1] https://goomics.net/50/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo < >>>>>>>>>>> alex...@starburstdata.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hey all, >>>>>>>>>>>> >>>>>>>>>>>> Just to throw my 2 cents in, I agree with Steven and others >>>>>>>>>>>> that we do need some kind of replacement before deprecating >>>>>>>>>>>> equality >>>>>>>>>>>> deletes. >>>>>>>>>>>> They certainly have their problems, and do significantly >>>>>>>>>>>> increase complexity as they are now, but the writing of position >>>>>>>>>>>> deletes is >>>>>>>>>>>> too expensive for certain pipelines. >>>>>>>>>>>> >>>>>>>>>>>> We've been investigating using equality deletes for some of our >>>>>>>>>>>> workloads at Starburst, the key advantage we were hoping to >>>>>>>>>>>> leverage is >>>>>>>>>>>> cheap, effectively random access lookup deletes. >>>>>>>>>>>> Say you have a UUID column that's unique in a table and want to >>>>>>>>>>>> delete a row by UUID. With position deletes each delete is >>>>>>>>>>>> expensive >>>>>>>>>>>> without an index on that UUID. >>>>>>>>>>>> With equality deletes each delete is cheap and while >>>>>>>>>>>> reads/compaction is expensive but when updates are frequent and >>>>>>>>>>>> reads are >>>>>>>>>>>> sporadic that's a reasonable tradeoff. >>>>>>>>>>>> >>>>>>>>>>>> Pretty much what Jason and Steven have already said. >>>>>>>>>>>> >>>>>>>>>>>> Maybe there are some incremental improvements on equality >>>>>>>>>>>> deletes or tips from similar systems that might alleviate some of >>>>>>>>>>>> their >>>>>>>>>>>> problems? >>>>>>>>>>>> >>>>>>>>>>>> - Alex Jo >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu < >>>>>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> We probably all agree with the downside of equality deletes: >>>>>>>>>>>>> it postpones all the work on the read path. >>>>>>>>>>>>> >>>>>>>>>>>>> In theory, we can implement position deletes only in the Flink >>>>>>>>>>>>> streaming writer. It would require the tracking of last committed >>>>>>>>>>>>> data >>>>>>>>>>>>> files per key, which can be stored in Flink state (checkpointed). >>>>>>>>>>>>> This is >>>>>>>>>>>>> obviously quite expensive/challenging, but possible. >>>>>>>>>>>>> >>>>>>>>>>>>> I like to echo one benefit of equality deletes that Russel >>>>>>>>>>>>> called out in the original email. Equality deletes would never >>>>>>>>>>>>> have conflicts. that is important for streaming writers (Flink, >>>>>>>>>>>>> Kafka >>>>>>>>>>>>> connect, ...) that commit frequently (minutes or less). Assume >>>>>>>>>>>>> Flink can >>>>>>>>>>>>> write position deletes only and commit every 2 minutes. The >>>>>>>>>>>>> long-running >>>>>>>>>>>>> nature of streaming jobs can cause frequent commit conflicts with >>>>>>>>>>>>> background delete compaction jobs. >>>>>>>>>>>>> >>>>>>>>>>>>> Overall, the streaming upsert write is not a well solved >>>>>>>>>>>>> problem in Iceberg. This probably affects all streaming engines >>>>>>>>>>>>> (Flink, >>>>>>>>>>>>> Kafka connect, Spark streaming, ...). We need to come up with >>>>>>>>>>>>> some better >>>>>>>>>>>>> alternatives before we can deprecate equality deletes. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer < >>>>>>>>>>>>> russell.spit...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> For users of Equality Deletes, what are the key benefits to >>>>>>>>>>>>>> Equality Deletes that you would like to preserve and could you >>>>>>>>>>>>>> please share >>>>>>>>>>>>>> some concrete examples of the queries you want to run (and the >>>>>>>>>>>>>> schemas and >>>>>>>>>>>>>> data sizes you would like to run them against) and the latencies >>>>>>>>>>>>>> that would >>>>>>>>>>>>>> be acceptable? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine >>>>>>>>>>>>>> <ja...@upsolver.com.invalid> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Representing Upsolver here, we also make use of Equality >>>>>>>>>>>>>>> Deletes to deliver high frequency low latency updates to our >>>>>>>>>>>>>>> clients at >>>>>>>>>>>>>>> scale. We have customers using them at scale and demonstrating >>>>>>>>>>>>>>> the need and >>>>>>>>>>>>>>> viability. We automate the process of converting them into >>>>>>>>>>>>>>> positional >>>>>>>>>>>>>>> deletes (or fully applying them) for more efficient engine >>>>>>>>>>>>>>> queries in the >>>>>>>>>>>>>>> background giving our users both low latency and good query >>>>>>>>>>>>>>> performance. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Equality Deletes were added since there isn't a good way to >>>>>>>>>>>>>>> solve frequent updates otherwise. It would require some sort of >>>>>>>>>>>>>>> index >>>>>>>>>>>>>>> keeping track of every record in the table (by a predetermined >>>>>>>>>>>>>>> PK) and >>>>>>>>>>>>>>> maintaining such an index is a huge task that every tool >>>>>>>>>>>>>>> interested in this >>>>>>>>>>>>>>> would need to re-implement. It also becomes a bottleneck >>>>>>>>>>>>>>> limiting table >>>>>>>>>>>>>>> sizes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I don't think they should be removed without providing an >>>>>>>>>>>>>>> alternative. Positional Deletes have a different performance >>>>>>>>>>>>>>> profile >>>>>>>>>>>>>>> inherently, requiring more upfront work proportional to the >>>>>>>>>>>>>>> table size. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré < >>>>>>>>>>>>>>> j...@nanthrax.net> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Russell >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for the nice writeup and the proposal. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I agree with your analysis, and I have the same feeling. >>>>>>>>>>>>>>>> However, I >>>>>>>>>>>>>>>> think there are more than Flink that write equality delete >>>>>>>>>>>>>>>> files. So, >>>>>>>>>>>>>>>> I agree to deprecate in V3, but maybe be more "flexible" >>>>>>>>>>>>>>>> about removal >>>>>>>>>>>>>>>> in V4 in order to give time to engines to update. >>>>>>>>>>>>>>>> I think that by deprecating equality deletes, we are >>>>>>>>>>>>>>>> clearly focusing >>>>>>>>>>>>>>>> on read performance and "consistency" (more than write). >>>>>>>>>>>>>>>> It's not >>>>>>>>>>>>>>>> necessarily a bad thing but the streaming platform and data >>>>>>>>>>>>>>>> ingestion >>>>>>>>>>>>>>>> platforms will be probably concerned about that (by using >>>>>>>>>>>>>>>> positional >>>>>>>>>>>>>>>> deletes, they will have to scan/read all datafiles to find >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> position, so painful). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So, to summarize: >>>>>>>>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to commit >>>>>>>>>>>>>>>> any target >>>>>>>>>>>>>>>> for deletion before having a clear path for streaming >>>>>>>>>>>>>>>> platforms >>>>>>>>>>>>>>>> (Flink, Beam, ...) >>>>>>>>>>>>>>>> 2. In the meantime (during the deprecation period), I >>>>>>>>>>>>>>>> propose to >>>>>>>>>>>>>>>> explore possible improvements for streaming platforms >>>>>>>>>>>>>>>> (maybe finding a >>>>>>>>>>>>>>>> way to avoid full data files scan, ...) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks ! >>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>> JB >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer >>>>>>>>>>>>>>>> <russell.spit...@gmail.com> wrote: >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Background: >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > 1) Position Deletes >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Writers determine what rows are deleted and mark them in >>>>>>>>>>>>>>>> a 1 for 1 representation. With delete vectors this means every >>>>>>>>>>>>>>>> data file >>>>>>>>>>>>>>>> has at most 1 delete vector that it is read in conjunction >>>>>>>>>>>>>>>> with to excise >>>>>>>>>>>>>>>> deleted rows. Reader overhead is more or less constant and is >>>>>>>>>>>>>>>> very >>>>>>>>>>>>>>>> predictable. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > The main cost of this mode is that deletes must be >>>>>>>>>>>>>>>> determined at write time which is expensive and can be more >>>>>>>>>>>>>>>> difficult for >>>>>>>>>>>>>>>> conflict resolution >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > 2) Equality Deletes >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Writers write out reference to what values are deleted >>>>>>>>>>>>>>>> (in a partition or globally). There can be an unlimited number >>>>>>>>>>>>>>>> of equality >>>>>>>>>>>>>>>> deletes and they all must be checked for every data file that >>>>>>>>>>>>>>>> is read. The >>>>>>>>>>>>>>>> cost of determining deleted rows is essentially given to the >>>>>>>>>>>>>>>> reader. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Conflicts almost never happen since data files are not >>>>>>>>>>>>>>>> actually changed and there is almost no cost to the writer to >>>>>>>>>>>>>>>> generate >>>>>>>>>>>>>>>> these. Almost all costs related to equality deletes are passed >>>>>>>>>>>>>>>> on to the >>>>>>>>>>>>>>>> reader. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Proposal: >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Equality deletes are, in my opinion, unsustainable and we >>>>>>>>>>>>>>>> should work on deprecating and removing them from the >>>>>>>>>>>>>>>> specification. At >>>>>>>>>>>>>>>> this time, I know of only one engine (Apache Flink) which >>>>>>>>>>>>>>>> produces these >>>>>>>>>>>>>>>> deletes but almost all engines have implementations to read >>>>>>>>>>>>>>>> them. The cost >>>>>>>>>>>>>>>> of implementing equality deletes on the read path is difficult >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> unpredictable in terms of memory usage and compute complexity. >>>>>>>>>>>>>>>> We’ve had >>>>>>>>>>>>>>>> suggestions of implementing rocksdb inorder to handle ever >>>>>>>>>>>>>>>> growing sets of >>>>>>>>>>>>>>>> equality deletes which in my opinion shows that we are going >>>>>>>>>>>>>>>> down the wrong >>>>>>>>>>>>>>>> path. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Outside of performance, Equality deletes are also >>>>>>>>>>>>>>>> difficult to use in conjunction with many other features. For >>>>>>>>>>>>>>>> example, any >>>>>>>>>>>>>>>> features requiring CDC or Row lineage are basically impossible >>>>>>>>>>>>>>>> when >>>>>>>>>>>>>>>> equality deletes are in use. When Equality deletes are >>>>>>>>>>>>>>>> present, the state >>>>>>>>>>>>>>>> of the table can only be determined with a full scan making it >>>>>>>>>>>>>>>> difficult to >>>>>>>>>>>>>>>> update differential structures. This means materialized views >>>>>>>>>>>>>>>> or indexes >>>>>>>>>>>>>>>> need to essentially be fully rebuilt whenever an equality >>>>>>>>>>>>>>>> delete is added >>>>>>>>>>>>>>>> to the table. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Equality deletes essentially remove complexity from the >>>>>>>>>>>>>>>> write side but then add what I believe is an unacceptable >>>>>>>>>>>>>>>> level of >>>>>>>>>>>>>>>> complexity to the read side. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Because of this I suggest we deprecate Equality Deletes >>>>>>>>>>>>>>>> in V3 and slate them for full removal from the Iceberg Spec in >>>>>>>>>>>>>>>> V4. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > I know this is a big change and compatibility breakage so >>>>>>>>>>>>>>>> I would like to introduce this idea to the community and >>>>>>>>>>>>>>>> solicit feedback >>>>>>>>>>>>>>>> from all stakeholders. I am very flexible on this issue and >>>>>>>>>>>>>>>> would like to >>>>>>>>>>>>>>>> hear the best issues both for and against removal of Equality >>>>>>>>>>>>>>>> Deletes. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Thanks everyone for your time, >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Russ Spitzer >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Jason Fine* >>>>>>>>>>>>>>> Chief Software Architect >>>>>>>>>>>>>>> ja...@upsolver.com | www.upsolver.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>