Hi, I am reading about iceberg and am quite new to this. This puffin would be an index from key to data file. Other use cases of Puffin, such as statistics are at a per file level if I understand correctly.
Where would the puffin about key->data file be stored? It is a property of the entire table. Thanks, Vignesh. On Sat, Nov 9, 2024 at 2:17 AM Shani Elharrar <sh...@upsolver.com.invalid> wrote: > JB, this is what we do, we write Equality Deletes and periodically convert > them to Positional Deletes. > > We could probably index the keys, maybe partially index using bloom > filters, the best would be to put those bloom filters inside puffin. > > Shani. > > On 9 Nov 2024, at 11:11, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > > > Hi, > > I agree with Peter here, and I would say that it would be an issue for > multi-engine support. > > I think, as I already mentioned with others, we should explore an > alternative. > As the main issue is the datafile scan in streaming context, maybe we > could find a way to "index"/correlate for positional deletes with limited > scanning. > I will think again about that :) > > Regards > JB > > On Sat, Nov 9, 2024 at 6:48 AM Péter Váry <peter.vary.apa...@gmail.com> > wrote: > >> Hi Imran, >> >> I don't think it's a good idea to start creating multiple types of >> Iceberg tables. Iceberg's main selling point is compatibility between >> engines. If we don't have readers and writers for all types of tables, then >> we remove compatibility from the equation and engine specific formats >> always win. OTOH, if we write readers and writers for all types of tables >> then we are back on square one. >> >> Identifier fields are a table schema concept and used in many cases >> during query planning and execution. This is why they are defined as part >> of the SQL spec, and this is why Iceberg defines them as well. One use case >> is where they can be used to merge deletes (independently of how they are >> manifested) and subsequent inserts, into updates. >> >> Flink SQL doesn't allow creating tables with partition transforms, so no >> new table could be created by Flink SQL using transforms, but tables >> created by other engines could still be used (both read an write). Also you >> can create such tables in Flink using the Java API. >> >> Requiring partition columns be part of the identifier fields is coming >> from the practical consideration, that you want to limit the scope of the >> equality deletes as much as possible. Otherwise all of the equality deletes >> should be table global, and they should be read by every reader. We could >> write those, we just decided that we don't want to allow the user to do >> this, as it is most cases a bad idea. >> >> I hope this helps, >> Peter >> >> On Fri, Nov 8, 2024, 22:01 Imran Rashid <iras...@cloudera.com.invalid> >> wrote: >> >>> I'm not down in the weeds at all myself on implementation details, so >>> forgive me if I'm wrong about the details here. >>> >>> I can see all the viewpoints -- both that equality deletes enable some >>> use cases, but also make others far more difficult. What surprised me the >>> most is that Iceberg does not provide a way to distinguish these two table >>> "types". >>> >>> At first, I thought the presence of an identifier-field ( >>> https://iceberg.apache.org/spec/#identifier-field-ids) indicated that >>> the table was a target for equality deletes. But, then it turns out >>> identifier-fields are also useful for changelog views even without equality >>> deletes -- IIUC, they show that a delete + insert should actually be >>> interpreted as an update in changelog view. >>> >>> To be perfectly honest, I'm confused about all of these details -- from >>> my read, the spec does not indicate this relationship between >>> identifier-fields and equality_ids in equality delete files ( >>> https://iceberg.apache.org/spec/#equality-delete-files), but I think >>> that is the way Flink works. Flink itself seems to have even more >>> limitations -- no partition transforms are allowed, and all partition >>> columns must be a subset of the identifier fields. Is that just a Flink >>> limitation, or is that the intended behavior in the spec? (Or maybe >>> user-error on my part?) Those seem like very reasonable limitations, from >>> an implementation point-of-view. But OTOH, as a user, this seems to be >>> directly contrary to some of the promises of Iceberg. >>> >>> Its easy to see if a table already has equality deletes in it, by >>> looking at the metadata. But is there any way to indicate that a table (or >>> branch of a table) _must not_ have equality deletes added to it? >>> >>> If that were possible, it seems like we could support both use cases. >>> We could continue to optimize for the streaming ingestion use cases using >>> equality deletes. But we could also build more optimizations into the >>> "non-streaming-ingestion" branches. And we could document the tradeoff so >>> it is much clearer to end users. >>> >>> To maintain compatibility, I suppose that the change would be that >>> equality deletes continue to be allowed by default, but we'd add a new >>> field to indicate that for some tables (or branches of a table), equality >>> deletes would not be allowed. And it would be an error for an engine to >>> make an update which added an equality delete to such a table. >>> >>> Maybe that change would even be possible in V3. >>> >>> And if all the performance improvements to equality deletes make this a >>> moot point, we could drop the field in v4. But it seems like a mistake to >>> both limit the non-streaming use-case AND have confusing limitations for >>> the end-user in the meantime. >>> >>> I would happily be corrected about my understanding of all of the above. >>> >>> thanks! >>> Imran >>> >>> On Tue, Nov 5, 2024 at 9:16 AM Bryan Keller <brya...@gmail.com> wrote: >>> >>>> I also feel we should keep equality deletes until we have an >>>> alternative solution for streaming updates/deletes. >>>> >>>> -Bryan >>>> >>>> On Nov 4, 2024, at 8:33 AM, Péter Váry <peter.vary.apa...@gmail.com> >>>> wrote: >>>> >>>> Well, it seems like I'm a little late, so most of the arguments are >>>> voiced. >>>> >>>> I agree that we should not deprecate the equality deletes until we have >>>> a replacement feature. >>>> I think one of the big advantages of Iceberg is that it supports batch >>>> processing and streaming ingestion too. >>>> For streaming ingestion we need a way to update existing data in a >>>> performant way, but restricting deletes for the primary keys seems like >>>> enough from the streaming perspective. >>>> >>>> Equality deletes allow a very wide range of applications, which we >>>> might be able to narrow down a bit, but still keep useful. So if we want to >>>> go down this road, we need to start collecting the requirements. >>>> >>>> Thanks, >>>> Peter >>>> >>>> Shani Elharrar <sh...@upsolver.com.invalid> ezt írta (időpont: 2024. >>>> nov. 1., P, 19:22): >>>> >>>>> I understand how it makes sense for batch jobs, but it damages stream >>>>> jobs, using equality deletes works much better for streaming (which have a >>>>> strict SLA for delays), and in order to decrease the performance penalty - >>>>> systems can rewrite the equality deletes to positional deletes. >>>>> >>>>> Shani. >>>>> >>>>> On 1 Nov 2024, at 20:06, Steven Wu <stevenz...@gmail.com> wrote: >>>>> >>>>> >>>>> Fundamentally, it is very difficult to write position deletes with >>>>> concurrent writers and conflicts for batch jobs too, as the inverted index >>>>> may become invalid/stale. >>>>> >>>>> The position deletes are created during the write phase. But conflicts >>>>> are only detected at the commit stage. I assume the batch job should fail >>>>> in this case. >>>>> >>>>> On Fri, Nov 1, 2024 at 10:57 AM Steven Wu <stevenz...@gmail.com> >>>>> wrote: >>>>> >>>>>> Shani, >>>>>> >>>>>> That is a good point. It is certainly a limitation for the Flink job >>>>>> to track the inverted index internally (which is what I had in mind). It >>>>>> can't be shared/synchronized with other Flink jobs or other engines >>>>>> writing >>>>>> to the same table. >>>>>> >>>>>> Thanks, >>>>>> Steven >>>>>> >>>>>> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar >>>>>> <sh...@upsolver.com.invalid> wrote: >>>>>> >>>>>>> Even if Flink can create this state, it would have to be maintained >>>>>>> against the Iceberg table, we wouldn't like duplicates (keys) if other >>>>>>> systems / users update the table (e.g manual insert / updates using >>>>>>> DML). >>>>>>> >>>>>>> Shani. >>>>>>> >>>>>>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com> wrote: >>>>>>> >>>>>>> >>>>>>> > Add support for inverted indexes to reduce the cost of position >>>>>>> lookup. This is fairly tricky to implement for streaming use cases >>>>>>> without >>>>>>> an external system. >>>>>>> >>>>>>> Anton, that is also what I was saying earlier. In Flink, the >>>>>>> inverted index of (key, committed data files) can be tracked in Flink >>>>>>> state. >>>>>>> >>>>>>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi < >>>>>>> aokolnyc...@gmail.com> wrote: >>>>>>> >>>>>>>> I was a bit skeptical when we were adding equality deletes, but >>>>>>>> nothing beats their performance during writes. We have to find an >>>>>>>> alternative before deprecating. >>>>>>>> >>>>>>>> We are doing a lot of work to improve streaming, like reducing the >>>>>>>> cost of commits, enabling a large (potentially infinite) number of >>>>>>>> snapshots, changelog reads, and so on. It is a project goal to excel in >>>>>>>> streaming. >>>>>>>> >>>>>>>> I was going to focus on equality deletes after completing the DV >>>>>>>> work. I believe we have these options: >>>>>>>> >>>>>>>> - Revisit the existing design of equality deletes (e.g. add more >>>>>>>> restrictions, improve compaction, offer new writers). >>>>>>>> - Standardize on the view-based approach [1] to handle streaming >>>>>>>> upserts and CDC use cases, potentially making this part of the spec. >>>>>>>> - Add support for inverted indexes to reduce the cost of position >>>>>>>> lookup. This is fairly tricky to implement for streaming use cases >>>>>>>> without >>>>>>>> an external system. Our runtime filtering in Spark today is equivalent >>>>>>>> to >>>>>>>> looking up positions in an inverted index represented by another >>>>>>>> Iceberg >>>>>>>> table. That may still not be enough for some streaming use cases. >>>>>>>> >>>>>>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/ >>>>>>>> >>>>>>>> - Anton >>>>>>>> >>>>>>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield <emkornfi...@gmail.com> >>>>>>>> пише: >>>>>>>> >>>>>>>>> I agree that equality deletes have their place in streaming. I >>>>>>>>> think the ultimate decision here is how opinionated Iceberg wants to >>>>>>>>> be on >>>>>>>>> its use-cases. If it really wants to stick to its origins of "slow >>>>>>>>> moving >>>>>>>>> data", then removing equality deletes would be inline with this. I >>>>>>>>> think >>>>>>>>> the other high level question is how much we allow for partially >>>>>>>>> compatible >>>>>>>>> features (the row lineage use-case feature was explicitly approved >>>>>>>>> excluding equality deletes, and people seemed OK with it at the time. >>>>>>>>> If >>>>>>>>> all features need to work together, then maybe we need to rethink the >>>>>>>>> design here so it can be forward compatible with equality deletes). >>>>>>>>> >>>>>>>>> I think one issue with equality deletes as stated in the spec is >>>>>>>>> that they are overly broad. I'd be interested if people have any use >>>>>>>>> cases >>>>>>>>> that differ, but I think one way of narrowing (and probably a >>>>>>>>> necessary >>>>>>>>> building block for building something better) the specification >>>>>>>>> scope on >>>>>>>>> equality deletes is to focus on upsert/Streaming deletes. Two >>>>>>>>> proposals in >>>>>>>>> this regard are: >>>>>>>>> >>>>>>>>> 1. Require that equality deletes can only correspond to unique >>>>>>>>> identifiers for the table. >>>>>>>>> 2. Consider requiring that for equality deletes on partitioned >>>>>>>>> tables, that the primary key must contain a partition column (I >>>>>>>>> believe >>>>>>>>> Flink at least already does this). It is less clear to me that this >>>>>>>>> would >>>>>>>>> meet all existing use-cases. But having this would allow for better >>>>>>>>> incremental data-structures, which could then be partition based. >>>>>>>>> >>>>>>>>> Narrow scope to unique identifiers would allow for further >>>>>>>>> building blocks already mentioned, like a secondary index (possible >>>>>>>>> via LSM >>>>>>>>> tree), that would allow for better performance overall. >>>>>>>>> >>>>>>>>> I generally agree with the sentiment that we shouldn't deprecate >>>>>>>>> them until there is a viable replacement. With all due respect to my >>>>>>>>> employer, let's not fall into the Google trap [1] :) >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Micah >>>>>>>>> >>>>>>>>> [1] https://goomics.net/50/ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo < >>>>>>>>> alex...@starburstdata.com> wrote: >>>>>>>>> >>>>>>>>>> Hey all, >>>>>>>>>> >>>>>>>>>> Just to throw my 2 cents in, I agree with Steven and others that >>>>>>>>>> we do need some kind of replacement before deprecating equality >>>>>>>>>> deletes. >>>>>>>>>> They certainly have their problems, and do significantly increase >>>>>>>>>> complexity as they are now, but the writing of position deletes is >>>>>>>>>> too >>>>>>>>>> expensive for certain pipelines. >>>>>>>>>> >>>>>>>>>> We've been investigating using equality deletes for some of our >>>>>>>>>> workloads at Starburst, the key advantage we were hoping to leverage >>>>>>>>>> is >>>>>>>>>> cheap, effectively random access lookup deletes. >>>>>>>>>> Say you have a UUID column that's unique in a table and want to >>>>>>>>>> delete a row by UUID. With position deletes each delete is expensive >>>>>>>>>> without an index on that UUID. >>>>>>>>>> With equality deletes each delete is cheap and while >>>>>>>>>> reads/compaction is expensive but when updates are frequent and >>>>>>>>>> reads are >>>>>>>>>> sporadic that's a reasonable tradeoff. >>>>>>>>>> >>>>>>>>>> Pretty much what Jason and Steven have already said. >>>>>>>>>> >>>>>>>>>> Maybe there are some incremental improvements on equality deletes >>>>>>>>>> or tips from similar systems that might alleviate some of their >>>>>>>>>> problems? >>>>>>>>>> >>>>>>>>>> - Alex Jo >>>>>>>>>> >>>>>>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu <stevenz...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> We probably all agree with the downside of equality deletes: it >>>>>>>>>>> postpones all the work on the read path. >>>>>>>>>>> >>>>>>>>>>> In theory, we can implement position deletes only in the Flink >>>>>>>>>>> streaming writer. It would require the tracking of last committed >>>>>>>>>>> data >>>>>>>>>>> files per key, which can be stored in Flink state (checkpointed). >>>>>>>>>>> This is >>>>>>>>>>> obviously quite expensive/challenging, but possible. >>>>>>>>>>> >>>>>>>>>>> I like to echo one benefit of equality deletes that Russel >>>>>>>>>>> called out in the original email. Equality deletes would never >>>>>>>>>>> have conflicts. that is important for streaming writers (Flink, >>>>>>>>>>> Kafka >>>>>>>>>>> connect, ...) that commit frequently (minutes or less). Assume >>>>>>>>>>> Flink can >>>>>>>>>>> write position deletes only and commit every 2 minutes. The >>>>>>>>>>> long-running >>>>>>>>>>> nature of streaming jobs can cause frequent commit conflicts with >>>>>>>>>>> background delete compaction jobs. >>>>>>>>>>> >>>>>>>>>>> Overall, the streaming upsert write is not a well solved problem >>>>>>>>>>> in Iceberg. This probably affects all streaming engines (Flink, >>>>>>>>>>> Kafka >>>>>>>>>>> connect, Spark streaming, ...). We need to come up with some better >>>>>>>>>>> alternatives before we can deprecate equality deletes. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer < >>>>>>>>>>> russell.spit...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> For users of Equality Deletes, what are the key benefits to >>>>>>>>>>>> Equality Deletes that you would like to preserve and could you >>>>>>>>>>>> please share >>>>>>>>>>>> some concrete examples of the queries you want to run (and the >>>>>>>>>>>> schemas and >>>>>>>>>>>> data sizes you would like to run them against) and the latencies >>>>>>>>>>>> that would >>>>>>>>>>>> be acceptable? >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine >>>>>>>>>>>> <ja...@upsolver.com.invalid> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Representing Upsolver here, we also make use of Equality >>>>>>>>>>>>> Deletes to deliver high frequency low latency updates to our >>>>>>>>>>>>> clients at >>>>>>>>>>>>> scale. We have customers using them at scale and demonstrating >>>>>>>>>>>>> the need and >>>>>>>>>>>>> viability. We automate the process of converting them into >>>>>>>>>>>>> positional >>>>>>>>>>>>> deletes (or fully applying them) for more efficient engine >>>>>>>>>>>>> queries in the >>>>>>>>>>>>> background giving our users both low latency and good query >>>>>>>>>>>>> performance. >>>>>>>>>>>>> >>>>>>>>>>>>> Equality Deletes were added since there isn't a good way to >>>>>>>>>>>>> solve frequent updates otherwise. It would require some sort of >>>>>>>>>>>>> index >>>>>>>>>>>>> keeping track of every record in the table (by a predetermined >>>>>>>>>>>>> PK) and >>>>>>>>>>>>> maintaining such an index is a huge task that every tool >>>>>>>>>>>>> interested in this >>>>>>>>>>>>> would need to re-implement. It also becomes a bottleneck limiting >>>>>>>>>>>>> table >>>>>>>>>>>>> sizes. >>>>>>>>>>>>> >>>>>>>>>>>>> I don't think they should be removed without providing an >>>>>>>>>>>>> alternative. Positional Deletes have a different performance >>>>>>>>>>>>> profile >>>>>>>>>>>>> inherently, requiring more upfront work proportional to the table >>>>>>>>>>>>> size. >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré < >>>>>>>>>>>>> j...@nanthrax.net> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Russell >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for the nice writeup and the proposal. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I agree with your analysis, and I have the same feeling. >>>>>>>>>>>>>> However, I >>>>>>>>>>>>>> think there are more than Flink that write equality delete >>>>>>>>>>>>>> files. So, >>>>>>>>>>>>>> I agree to deprecate in V3, but maybe be more "flexible" >>>>>>>>>>>>>> about removal >>>>>>>>>>>>>> in V4 in order to give time to engines to update. >>>>>>>>>>>>>> I think that by deprecating equality deletes, we are clearly >>>>>>>>>>>>>> focusing >>>>>>>>>>>>>> on read performance and "consistency" (more than write). It's >>>>>>>>>>>>>> not >>>>>>>>>>>>>> necessarily a bad thing but the streaming platform and data >>>>>>>>>>>>>> ingestion >>>>>>>>>>>>>> platforms will be probably concerned about that (by using >>>>>>>>>>>>>> positional >>>>>>>>>>>>>> deletes, they will have to scan/read all datafiles to find the >>>>>>>>>>>>>> position, so painful). >>>>>>>>>>>>>> >>>>>>>>>>>>>> So, to summarize: >>>>>>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to commit any >>>>>>>>>>>>>> target >>>>>>>>>>>>>> for deletion before having a clear path for streaming >>>>>>>>>>>>>> platforms >>>>>>>>>>>>>> (Flink, Beam, ...) >>>>>>>>>>>>>> 2. In the meantime (during the deprecation period), I propose >>>>>>>>>>>>>> to >>>>>>>>>>>>>> explore possible improvements for streaming platforms (maybe >>>>>>>>>>>>>> finding a >>>>>>>>>>>>>> way to avoid full data files scan, ...) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks ! >>>>>>>>>>>>>> Regards >>>>>>>>>>>>>> JB >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer >>>>>>>>>>>>>> <russell.spit...@gmail.com> wrote: >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Background: >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > 1) Position Deletes >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Writers determine what rows are deleted and mark them in a >>>>>>>>>>>>>> 1 for 1 representation. With delete vectors this means every >>>>>>>>>>>>>> data file has >>>>>>>>>>>>>> at most 1 delete vector that it is read in conjunction with to >>>>>>>>>>>>>> excise >>>>>>>>>>>>>> deleted rows. Reader overhead is more or less constant and is >>>>>>>>>>>>>> very >>>>>>>>>>>>>> predictable. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > The main cost of this mode is that deletes must be >>>>>>>>>>>>>> determined at write time which is expensive and can be more >>>>>>>>>>>>>> difficult for >>>>>>>>>>>>>> conflict resolution >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > 2) Equality Deletes >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Writers write out reference to what values are deleted (in >>>>>>>>>>>>>> a partition or globally). There can be an unlimited number of >>>>>>>>>>>>>> equality >>>>>>>>>>>>>> deletes and they all must be checked for every data file that is >>>>>>>>>>>>>> read. The >>>>>>>>>>>>>> cost of determining deleted rows is essentially given to the >>>>>>>>>>>>>> reader. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Conflicts almost never happen since data files are not >>>>>>>>>>>>>> actually changed and there is almost no cost to the writer to >>>>>>>>>>>>>> generate >>>>>>>>>>>>>> these. Almost all costs related to equality deletes are passed >>>>>>>>>>>>>> on to the >>>>>>>>>>>>>> reader. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Proposal: >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Equality deletes are, in my opinion, unsustainable and we >>>>>>>>>>>>>> should work on deprecating and removing them from the >>>>>>>>>>>>>> specification. At >>>>>>>>>>>>>> this time, I know of only one engine (Apache Flink) which >>>>>>>>>>>>>> produces these >>>>>>>>>>>>>> deletes but almost all engines have implementations to read >>>>>>>>>>>>>> them. The cost >>>>>>>>>>>>>> of implementing equality deletes on the read path is difficult >>>>>>>>>>>>>> and >>>>>>>>>>>>>> unpredictable in terms of memory usage and compute complexity. >>>>>>>>>>>>>> We’ve had >>>>>>>>>>>>>> suggestions of implementing rocksdb inorder to handle ever >>>>>>>>>>>>>> growing sets of >>>>>>>>>>>>>> equality deletes which in my opinion shows that we are going >>>>>>>>>>>>>> down the wrong >>>>>>>>>>>>>> path. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Outside of performance, Equality deletes are also difficult >>>>>>>>>>>>>> to use in conjunction with many other features. For example, any >>>>>>>>>>>>>> features >>>>>>>>>>>>>> requiring CDC or Row lineage are basically impossible when >>>>>>>>>>>>>> equality deletes >>>>>>>>>>>>>> are in use. When Equality deletes are present, the state of the >>>>>>>>>>>>>> table can >>>>>>>>>>>>>> only be determined with a full scan making it difficult to update >>>>>>>>>>>>>> differential structures. This means materialized views or >>>>>>>>>>>>>> indexes need to >>>>>>>>>>>>>> essentially be fully rebuilt whenever an equality delete is >>>>>>>>>>>>>> added to the >>>>>>>>>>>>>> table. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Equality deletes essentially remove complexity from the >>>>>>>>>>>>>> write side but then add what I believe is an unacceptable level >>>>>>>>>>>>>> of >>>>>>>>>>>>>> complexity to the read side. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Because of this I suggest we deprecate Equality Deletes in >>>>>>>>>>>>>> V3 and slate them for full removal from the Iceberg Spec in V4. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > I know this is a big change and compatibility breakage so I >>>>>>>>>>>>>> would like to introduce this idea to the community and solicit >>>>>>>>>>>>>> feedback >>>>>>>>>>>>>> from all stakeholders. I am very flexible on this issue and >>>>>>>>>>>>>> would like to >>>>>>>>>>>>>> hear the best issues both for and against removal of Equality >>>>>>>>>>>>>> Deletes. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Thanks everyone for your time, >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Russ Spitzer >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> >>>>>>>>>>>>> *Jason Fine* >>>>>>>>>>>>> Chief Software Architect >>>>>>>>>>>>> ja...@upsolver.com | www.upsolver.com >>>>>>>>>>>>> >>>>>>>>>>>> >>>>