I second Anton's proposal to standardize on a view-based approach to handle CDC cases. Actually, it's already been explored in detail[1] by Jack before.
[1] Improving Change Data Capture Use Case for Apache Iceberg <https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit?tab=t.0#heading=h.94xnx4qg3bnt> On Tue, Nov 19, 2024 at 4:16 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > My proposal is the following (already expressed): > - ok for deprecate equality deletes > - not ok to remove it > - work on position deletes improvements to address streaming use cases. I > think we should explore different approaches. Personally I think a possible > approach would be to find index way to data files to avoid full scan to > find row position. > > My $0.01 :) > > Regards > JB > > Le mar. 19 nov. 2024 à 07:53, Ajantha Bhat <ajanthab...@gmail.com> a > écrit : > >> Hi, What's the conclusion on this thread? >> >> Users are looking for Upsert (CDC) support for OSS Iceberg kafka connect >> sink. >> We only support appends at the moment. Can we go ahead and implement the >> upserts using equality deletes? >> >> >> - Ajantha >> >> On Sun, Nov 10, 2024 at 11:56 AM Vignesh <vignesh.v...@gmail.com> wrote: >> >>> Hi, >>> I am reading about iceberg and am quite new to this. >>> This puffin would be an index from key to data file. Other use cases of >>> Puffin, such as statistics are at a per file level if I understand >>> correctly. >>> >>> Where would the puffin about key->data file be stored? It is a property >>> of the entire table. >>> >>> Thanks, >>> Vignesh. >>> >>> >>> On Sat, Nov 9, 2024 at 2:17 AM Shani Elharrar <sh...@upsolver.com.invalid> >>> wrote: >>> >>>> JB, this is what we do, we write Equality Deletes and periodically >>>> convert them to Positional Deletes. >>>> >>>> We could probably index the keys, maybe partially index using bloom >>>> filters, the best would be to put those bloom filters inside puffin. >>>> >>>> Shani. >>>> >>>> On 9 Nov 2024, at 11:11, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: >>>> >>>> >>>> Hi, >>>> >>>> I agree with Peter here, and I would say that it would be an issue for >>>> multi-engine support. >>>> >>>> I think, as I already mentioned with others, we should explore an >>>> alternative. >>>> As the main issue is the datafile scan in streaming context, maybe we >>>> could find a way to "index"/correlate for positional deletes with limited >>>> scanning. >>>> I will think again about that :) >>>> >>>> Regards >>>> JB >>>> >>>> On Sat, Nov 9, 2024 at 6:48 AM Péter Váry <peter.vary.apa...@gmail.com> >>>> wrote: >>>> >>>>> Hi Imran, >>>>> >>>>> I don't think it's a good idea to start creating multiple types of >>>>> Iceberg tables. Iceberg's main selling point is compatibility between >>>>> engines. If we don't have readers and writers for all types of tables, >>>>> then >>>>> we remove compatibility from the equation and engine specific formats >>>>> always win. OTOH, if we write readers and writers for all types of tables >>>>> then we are back on square one. >>>>> >>>>> Identifier fields are a table schema concept and used in many cases >>>>> during query planning and execution. This is why they are defined as part >>>>> of the SQL spec, and this is why Iceberg defines them as well. One use >>>>> case >>>>> is where they can be used to merge deletes (independently of how they are >>>>> manifested) and subsequent inserts, into updates. >>>>> >>>>> Flink SQL doesn't allow creating tables with partition transforms, so >>>>> no new table could be created by Flink SQL using transforms, but tables >>>>> created by other engines could still be used (both read an write). Also >>>>> you >>>>> can create such tables in Flink using the Java API. >>>>> >>>>> Requiring partition columns be part of the identifier fields is coming >>>>> from the practical consideration, that you want to limit the scope of the >>>>> equality deletes as much as possible. Otherwise all of the equality >>>>> deletes >>>>> should be table global, and they should be read by every reader. We could >>>>> write those, we just decided that we don't want to allow the user to do >>>>> this, as it is most cases a bad idea. >>>>> >>>>> I hope this helps, >>>>> Peter >>>>> >>>>> On Fri, Nov 8, 2024, 22:01 Imran Rashid <iras...@cloudera.com.invalid> >>>>> wrote: >>>>> >>>>>> I'm not down in the weeds at all myself on implementation details, so >>>>>> forgive me if I'm wrong about the details here. >>>>>> >>>>>> I can see all the viewpoints -- both that equality deletes enable >>>>>> some use cases, but also make others far more difficult. What surprised >>>>>> me >>>>>> the most is that Iceberg does not provide a way to distinguish these two >>>>>> table "types". >>>>>> >>>>>> At first, I thought the presence of an identifier-field ( >>>>>> https://iceberg.apache.org/spec/#identifier-field-ids) indicated >>>>>> that the table was a target for equality deletes. But, then it turns out >>>>>> identifier-fields are also useful for changelog views even without >>>>>> equality >>>>>> deletes -- IIUC, they show that a delete + insert should actually be >>>>>> interpreted as an update in changelog view. >>>>>> >>>>>> To be perfectly honest, I'm confused about all of these details -- >>>>>> from my read, the spec does not indicate this relationship between >>>>>> identifier-fields and equality_ids in equality delete files ( >>>>>> https://iceberg.apache.org/spec/#equality-delete-files), but I think >>>>>> that is the way Flink works. Flink itself seems to have even more >>>>>> limitations -- no partition transforms are allowed, and all partition >>>>>> columns must be a subset of the identifier fields. Is that just a Flink >>>>>> limitation, or is that the intended behavior in the spec? (Or maybe >>>>>> user-error on my part?) Those seem like very reasonable limitations, >>>>>> from >>>>>> an implementation point-of-view. But OTOH, as a user, this seems to be >>>>>> directly contrary to some of the promises of Iceberg. >>>>>> >>>>>> Its easy to see if a table already has equality deletes in it, by >>>>>> looking at the metadata. But is there any way to indicate that a table >>>>>> (or >>>>>> branch of a table) _must not_ have equality deletes added to it? >>>>>> >>>>>> If that were possible, it seems like we could support both use >>>>>> cases. We could continue to optimize for the streaming ingestion use >>>>>> cases >>>>>> using equality deletes. But we could also build more optimizations into >>>>>> the "non-streaming-ingestion" branches. And we could document the >>>>>> tradeoff >>>>>> so it is much clearer to end users. >>>>>> >>>>>> To maintain compatibility, I suppose that the change would be that >>>>>> equality deletes continue to be allowed by default, but we'd add a new >>>>>> field to indicate that for some tables (or branches of a table), equality >>>>>> deletes would not be allowed. And it would be an error for an engine to >>>>>> make an update which added an equality delete to such a table. >>>>>> >>>>>> Maybe that change would even be possible in V3. >>>>>> >>>>>> And if all the performance improvements to equality deletes make this >>>>>> a moot point, we could drop the field in v4. But it seems like a mistake >>>>>> to both limit the non-streaming use-case AND have confusing limitations >>>>>> for >>>>>> the end-user in the meantime. >>>>>> >>>>>> I would happily be corrected about my understanding of all of the >>>>>> above. >>>>>> >>>>>> thanks! >>>>>> Imran >>>>>> >>>>>> On Tue, Nov 5, 2024 at 9:16 AM Bryan Keller <brya...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I also feel we should keep equality deletes until we have an >>>>>>> alternative solution for streaming updates/deletes. >>>>>>> >>>>>>> -Bryan >>>>>>> >>>>>>> On Nov 4, 2024, at 8:33 AM, Péter Váry <peter.vary.apa...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> Well, it seems like I'm a little late, so most of the arguments are >>>>>>> voiced. >>>>>>> >>>>>>> I agree that we should not deprecate the equality deletes until we >>>>>>> have a replacement feature. >>>>>>> I think one of the big advantages of Iceberg is that it supports >>>>>>> batch processing and streaming ingestion too. >>>>>>> For streaming ingestion we need a way to update existing data in a >>>>>>> performant way, but restricting deletes for the primary keys seems like >>>>>>> enough from the streaming perspective. >>>>>>> >>>>>>> Equality deletes allow a very wide range of applications, which we >>>>>>> might be able to narrow down a bit, but still keep useful. So if we >>>>>>> want to >>>>>>> go down this road, we need to start collecting the requirements. >>>>>>> >>>>>>> Thanks, >>>>>>> Peter >>>>>>> >>>>>>> Shani Elharrar <sh...@upsolver.com.invalid> ezt írta (időpont: >>>>>>> 2024. nov. 1., P, 19:22): >>>>>>> >>>>>>>> I understand how it makes sense for batch jobs, but it damages >>>>>>>> stream jobs, using equality deletes works much better for streaming >>>>>>>> (which >>>>>>>> have a strict SLA for delays), and in order to decrease the performance >>>>>>>> penalty - systems can rewrite the equality deletes to positional >>>>>>>> deletes. >>>>>>>> >>>>>>>> Shani. >>>>>>>> >>>>>>>> On 1 Nov 2024, at 20:06, Steven Wu <stevenz...@gmail.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Fundamentally, it is very difficult to write position deletes with >>>>>>>> concurrent writers and conflicts for batch jobs too, as the inverted >>>>>>>> index >>>>>>>> may become invalid/stale. >>>>>>>> >>>>>>>> The position deletes are created during the write phase. But >>>>>>>> conflicts are only detected at the commit stage. I assume the batch job >>>>>>>> should fail in this case. >>>>>>>> >>>>>>>> On Fri, Nov 1, 2024 at 10:57 AM Steven Wu <stevenz...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Shani, >>>>>>>>> >>>>>>>>> That is a good point. It is certainly a limitation for the Flink >>>>>>>>> job to track the inverted index internally (which is what I had in >>>>>>>>> mind). >>>>>>>>> It can't be shared/synchronized with other Flink jobs or other engines >>>>>>>>> writing to the same table. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Steven >>>>>>>>> >>>>>>>>> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar >>>>>>>>> <sh...@upsolver.com.invalid> wrote: >>>>>>>>> >>>>>>>>>> Even if Flink can create this state, it would have to be >>>>>>>>>> maintained against the Iceberg table, we wouldn't like duplicates >>>>>>>>>> (keys) if >>>>>>>>>> other systems / users update the table (e.g manual insert / updates >>>>>>>>>> using >>>>>>>>>> DML). >>>>>>>>>> >>>>>>>>>> Shani. >>>>>>>>>> >>>>>>>>>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > Add support for inverted indexes to reduce the cost of position >>>>>>>>>> lookup. This is fairly tricky to implement for streaming use cases >>>>>>>>>> without >>>>>>>>>> an external system. >>>>>>>>>> >>>>>>>>>> Anton, that is also what I was saying earlier. In Flink, the >>>>>>>>>> inverted index of (key, committed data files) can be tracked in >>>>>>>>>> Flink state. >>>>>>>>>> >>>>>>>>>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi < >>>>>>>>>> aokolnyc...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> I was a bit skeptical when we were adding equality deletes, but >>>>>>>>>>> nothing beats their performance during writes. We have to find an >>>>>>>>>>> alternative before deprecating. >>>>>>>>>>> >>>>>>>>>>> We are doing a lot of work to improve streaming, like reducing >>>>>>>>>>> the cost of commits, enabling a large (potentially infinite) number >>>>>>>>>>> of >>>>>>>>>>> snapshots, changelog reads, and so on. It is a project goal to >>>>>>>>>>> excel in >>>>>>>>>>> streaming. >>>>>>>>>>> >>>>>>>>>>> I was going to focus on equality deletes after completing the DV >>>>>>>>>>> work. I believe we have these options: >>>>>>>>>>> >>>>>>>>>>> - Revisit the existing design of equality deletes (e.g. add more >>>>>>>>>>> restrictions, improve compaction, offer new writers). >>>>>>>>>>> - Standardize on the view-based approach [1] to handle streaming >>>>>>>>>>> upserts and CDC use cases, potentially making this part of the spec. >>>>>>>>>>> - Add support for inverted indexes to reduce the cost of >>>>>>>>>>> position lookup. This is fairly tricky to implement for streaming >>>>>>>>>>> use cases >>>>>>>>>>> without an external system. Our runtime filtering in Spark today is >>>>>>>>>>> equivalent to looking up positions in an inverted index represented >>>>>>>>>>> by >>>>>>>>>>> another Iceberg table. That may still not be enough for some >>>>>>>>>>> streaming use >>>>>>>>>>> cases. >>>>>>>>>>> >>>>>>>>>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/ >>>>>>>>>>> >>>>>>>>>>> - Anton >>>>>>>>>>> >>>>>>>>>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield < >>>>>>>>>>> emkornfi...@gmail.com> пише: >>>>>>>>>>> >>>>>>>>>>>> I agree that equality deletes have their place in streaming. I >>>>>>>>>>>> think the ultimate decision here is how opinionated Iceberg wants >>>>>>>>>>>> to be on >>>>>>>>>>>> its use-cases. If it really wants to stick to its origins of >>>>>>>>>>>> "slow moving >>>>>>>>>>>> data", then removing equality deletes would be inline with this. >>>>>>>>>>>> I think >>>>>>>>>>>> the other high level question is how much we allow for partially >>>>>>>>>>>> compatible >>>>>>>>>>>> features (the row lineage use-case feature was explicitly approved >>>>>>>>>>>> excluding equality deletes, and people seemed OK with it at the >>>>>>>>>>>> time. If >>>>>>>>>>>> all features need to work together, then maybe we need to rethink >>>>>>>>>>>> the >>>>>>>>>>>> design here so it can be forward compatible with equality deletes). >>>>>>>>>>>> >>>>>>>>>>>> I think one issue with equality deletes as stated in the spec >>>>>>>>>>>> is that they are overly broad. I'd be interested if people have >>>>>>>>>>>> any use >>>>>>>>>>>> cases that differ, but I think one way of narrowing (and probably a >>>>>>>>>>>> necessary building block for building something better) the >>>>>>>>>>>> specification >>>>>>>>>>>> scope on equality deletes is to focus on upsert/Streaming deletes. >>>>>>>>>>>> Two >>>>>>>>>>>> proposals in this regard are: >>>>>>>>>>>> >>>>>>>>>>>> 1. Require that equality deletes can only correspond to unique >>>>>>>>>>>> identifiers for the table. >>>>>>>>>>>> 2. Consider requiring that for equality deletes on partitioned >>>>>>>>>>>> tables, that the primary key must contain a partition column (I >>>>>>>>>>>> believe >>>>>>>>>>>> Flink at least already does this). It is less clear to me that >>>>>>>>>>>> this would >>>>>>>>>>>> meet all existing use-cases. But having this would allow for >>>>>>>>>>>> better >>>>>>>>>>>> incremental data-structures, which could then be partition based. >>>>>>>>>>>> >>>>>>>>>>>> Narrow scope to unique identifiers would allow for further >>>>>>>>>>>> building blocks already mentioned, like a secondary index >>>>>>>>>>>> (possible via LSM >>>>>>>>>>>> tree), that would allow for better performance overall. >>>>>>>>>>>> >>>>>>>>>>>> I generally agree with the sentiment that we shouldn't >>>>>>>>>>>> deprecate them until there is a viable replacement. With all due >>>>>>>>>>>> respect >>>>>>>>>>>> to my employer, let's not fall into the Google trap [1] :) >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Micah >>>>>>>>>>>> >>>>>>>>>>>> [1] https://goomics.net/50/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo < >>>>>>>>>>>> alex...@starburstdata.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hey all, >>>>>>>>>>>>> >>>>>>>>>>>>> Just to throw my 2 cents in, I agree with Steven and others >>>>>>>>>>>>> that we do need some kind of replacement before deprecating >>>>>>>>>>>>> equality >>>>>>>>>>>>> deletes. >>>>>>>>>>>>> They certainly have their problems, and do significantly >>>>>>>>>>>>> increase complexity as they are now, but the writing of position >>>>>>>>>>>>> deletes is >>>>>>>>>>>>> too expensive for certain pipelines. >>>>>>>>>>>>> >>>>>>>>>>>>> We've been investigating using equality deletes for some of >>>>>>>>>>>>> our workloads at Starburst, the key advantage we were hoping to >>>>>>>>>>>>> leverage is >>>>>>>>>>>>> cheap, effectively random access lookup deletes. >>>>>>>>>>>>> Say you have a UUID column that's unique in a table and want >>>>>>>>>>>>> to delete a row by UUID. With position deletes each delete is >>>>>>>>>>>>> expensive >>>>>>>>>>>>> without an index on that UUID. >>>>>>>>>>>>> With equality deletes each delete is cheap and while >>>>>>>>>>>>> reads/compaction is expensive but when updates are frequent and >>>>>>>>>>>>> reads are >>>>>>>>>>>>> sporadic that's a reasonable tradeoff. >>>>>>>>>>>>> >>>>>>>>>>>>> Pretty much what Jason and Steven have already said. >>>>>>>>>>>>> >>>>>>>>>>>>> Maybe there are some incremental improvements on equality >>>>>>>>>>>>> deletes or tips from similar systems that might alleviate some of >>>>>>>>>>>>> their >>>>>>>>>>>>> problems? >>>>>>>>>>>>> >>>>>>>>>>>>> - Alex Jo >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu < >>>>>>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> We probably all agree with the downside of equality deletes: >>>>>>>>>>>>>> it postpones all the work on the read path. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In theory, we can implement position deletes only in the >>>>>>>>>>>>>> Flink streaming writer. It would require the tracking of last >>>>>>>>>>>>>> committed >>>>>>>>>>>>>> data files per key, which can be stored in Flink state >>>>>>>>>>>>>> (checkpointed). This >>>>>>>>>>>>>> is obviously quite expensive/challenging, but possible. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I like to echo one benefit of equality deletes that Russel >>>>>>>>>>>>>> called out in the original email. Equality deletes would never >>>>>>>>>>>>>> have conflicts. that is important for streaming writers (Flink, >>>>>>>>>>>>>> Kafka >>>>>>>>>>>>>> connect, ...) that commit frequently (minutes or less). Assume >>>>>>>>>>>>>> Flink can >>>>>>>>>>>>>> write position deletes only and commit every 2 minutes. The >>>>>>>>>>>>>> long-running >>>>>>>>>>>>>> nature of streaming jobs can cause frequent commit conflicts with >>>>>>>>>>>>>> background delete compaction jobs. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Overall, the streaming upsert write is not a well solved >>>>>>>>>>>>>> problem in Iceberg. This probably affects all streaming engines >>>>>>>>>>>>>> (Flink, >>>>>>>>>>>>>> Kafka connect, Spark streaming, ...). We need to come up with >>>>>>>>>>>>>> some better >>>>>>>>>>>>>> alternatives before we can deprecate equality deletes. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer < >>>>>>>>>>>>>> russell.spit...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> For users of Equality Deletes, what are the key benefits to >>>>>>>>>>>>>>> Equality Deletes that you would like to preserve and could you >>>>>>>>>>>>>>> please share >>>>>>>>>>>>>>> some concrete examples of the queries you want to run (and the >>>>>>>>>>>>>>> schemas and >>>>>>>>>>>>>>> data sizes you would like to run them against) and the >>>>>>>>>>>>>>> latencies that would >>>>>>>>>>>>>>> be acceptable? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine >>>>>>>>>>>>>>> <ja...@upsolver.com.invalid> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Representing Upsolver here, we also make use of Equality >>>>>>>>>>>>>>>> Deletes to deliver high frequency low latency updates to our >>>>>>>>>>>>>>>> clients at >>>>>>>>>>>>>>>> scale. We have customers using them at scale and demonstrating >>>>>>>>>>>>>>>> the need and >>>>>>>>>>>>>>>> viability. We automate the process of converting them into >>>>>>>>>>>>>>>> positional >>>>>>>>>>>>>>>> deletes (or fully applying them) for more efficient engine >>>>>>>>>>>>>>>> queries in the >>>>>>>>>>>>>>>> background giving our users both low latency and good query >>>>>>>>>>>>>>>> performance. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Equality Deletes were added since there isn't a good way to >>>>>>>>>>>>>>>> solve frequent updates otherwise. It would require some sort >>>>>>>>>>>>>>>> of index >>>>>>>>>>>>>>>> keeping track of every record in the table (by a predetermined >>>>>>>>>>>>>>>> PK) and >>>>>>>>>>>>>>>> maintaining such an index is a huge task that every tool >>>>>>>>>>>>>>>> interested in this >>>>>>>>>>>>>>>> would need to re-implement. It also becomes a bottleneck >>>>>>>>>>>>>>>> limiting table >>>>>>>>>>>>>>>> sizes. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I don't think they should be removed without providing an >>>>>>>>>>>>>>>> alternative. Positional Deletes have a different performance >>>>>>>>>>>>>>>> profile >>>>>>>>>>>>>>>> inherently, requiring more upfront work proportional to the >>>>>>>>>>>>>>>> table size. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré < >>>>>>>>>>>>>>>> j...@nanthrax.net> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Russell >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for the nice writeup and the proposal. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I agree with your analysis, and I have the same feeling. >>>>>>>>>>>>>>>>> However, I >>>>>>>>>>>>>>>>> think there are more than Flink that write equality delete >>>>>>>>>>>>>>>>> files. So, >>>>>>>>>>>>>>>>> I agree to deprecate in V3, but maybe be more "flexible" >>>>>>>>>>>>>>>>> about removal >>>>>>>>>>>>>>>>> in V4 in order to give time to engines to update. >>>>>>>>>>>>>>>>> I think that by deprecating equality deletes, we are >>>>>>>>>>>>>>>>> clearly focusing >>>>>>>>>>>>>>>>> on read performance and "consistency" (more than write). >>>>>>>>>>>>>>>>> It's not >>>>>>>>>>>>>>>>> necessarily a bad thing but the streaming platform and >>>>>>>>>>>>>>>>> data ingestion >>>>>>>>>>>>>>>>> platforms will be probably concerned about that (by using >>>>>>>>>>>>>>>>> positional >>>>>>>>>>>>>>>>> deletes, they will have to scan/read all datafiles to find >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> position, so painful). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So, to summarize: >>>>>>>>>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to commit >>>>>>>>>>>>>>>>> any target >>>>>>>>>>>>>>>>> for deletion before having a clear path for streaming >>>>>>>>>>>>>>>>> platforms >>>>>>>>>>>>>>>>> (Flink, Beam, ...) >>>>>>>>>>>>>>>>> 2. In the meantime (during the deprecation period), I >>>>>>>>>>>>>>>>> propose to >>>>>>>>>>>>>>>>> explore possible improvements for streaming platforms >>>>>>>>>>>>>>>>> (maybe finding a >>>>>>>>>>>>>>>>> way to avoid full data files scan, ...) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks ! >>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>> JB >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer >>>>>>>>>>>>>>>>> <russell.spit...@gmail.com> wrote: >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Background: >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > 1) Position Deletes >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Writers determine what rows are deleted and mark them in >>>>>>>>>>>>>>>>> a 1 for 1 representation. With delete vectors this means >>>>>>>>>>>>>>>>> every data file >>>>>>>>>>>>>>>>> has at most 1 delete vector that it is read in conjunction >>>>>>>>>>>>>>>>> with to excise >>>>>>>>>>>>>>>>> deleted rows. Reader overhead is more or less constant and is >>>>>>>>>>>>>>>>> very >>>>>>>>>>>>>>>>> predictable. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > The main cost of this mode is that deletes must be >>>>>>>>>>>>>>>>> determined at write time which is expensive and can be more >>>>>>>>>>>>>>>>> difficult for >>>>>>>>>>>>>>>>> conflict resolution >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > 2) Equality Deletes >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Writers write out reference to what values are deleted >>>>>>>>>>>>>>>>> (in a partition or globally). There can be an unlimited >>>>>>>>>>>>>>>>> number of equality >>>>>>>>>>>>>>>>> deletes and they all must be checked for every data file that >>>>>>>>>>>>>>>>> is read. The >>>>>>>>>>>>>>>>> cost of determining deleted rows is essentially given to the >>>>>>>>>>>>>>>>> reader. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Conflicts almost never happen since data files are not >>>>>>>>>>>>>>>>> actually changed and there is almost no cost to the writer to >>>>>>>>>>>>>>>>> generate >>>>>>>>>>>>>>>>> these. Almost all costs related to equality deletes are >>>>>>>>>>>>>>>>> passed on to the >>>>>>>>>>>>>>>>> reader. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Proposal: >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Equality deletes are, in my opinion, unsustainable and >>>>>>>>>>>>>>>>> we should work on deprecating and removing them from the >>>>>>>>>>>>>>>>> specification. At >>>>>>>>>>>>>>>>> this time, I know of only one engine (Apache Flink) which >>>>>>>>>>>>>>>>> produces these >>>>>>>>>>>>>>>>> deletes but almost all engines have implementations to read >>>>>>>>>>>>>>>>> them. The cost >>>>>>>>>>>>>>>>> of implementing equality deletes on the read path is >>>>>>>>>>>>>>>>> difficult and >>>>>>>>>>>>>>>>> unpredictable in terms of memory usage and compute >>>>>>>>>>>>>>>>> complexity. We’ve had >>>>>>>>>>>>>>>>> suggestions of implementing rocksdb inorder to handle ever >>>>>>>>>>>>>>>>> growing sets of >>>>>>>>>>>>>>>>> equality deletes which in my opinion shows that we are going >>>>>>>>>>>>>>>>> down the wrong >>>>>>>>>>>>>>>>> path. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Outside of performance, Equality deletes are also >>>>>>>>>>>>>>>>> difficult to use in conjunction with many other features. For >>>>>>>>>>>>>>>>> example, any >>>>>>>>>>>>>>>>> features requiring CDC or Row lineage are basically >>>>>>>>>>>>>>>>> impossible when >>>>>>>>>>>>>>>>> equality deletes are in use. When Equality deletes are >>>>>>>>>>>>>>>>> present, the state >>>>>>>>>>>>>>>>> of the table can only be determined with a full scan making >>>>>>>>>>>>>>>>> it difficult to >>>>>>>>>>>>>>>>> update differential structures. This means materialized views >>>>>>>>>>>>>>>>> or indexes >>>>>>>>>>>>>>>>> need to essentially be fully rebuilt whenever an equality >>>>>>>>>>>>>>>>> delete is added >>>>>>>>>>>>>>>>> to the table. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Equality deletes essentially remove complexity from the >>>>>>>>>>>>>>>>> write side but then add what I believe is an unacceptable >>>>>>>>>>>>>>>>> level of >>>>>>>>>>>>>>>>> complexity to the read side. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Because of this I suggest we deprecate Equality Deletes >>>>>>>>>>>>>>>>> in V3 and slate them for full removal from the Iceberg Spec >>>>>>>>>>>>>>>>> in V4. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > I know this is a big change and compatibility breakage >>>>>>>>>>>>>>>>> so I would like to introduce this idea to the community and >>>>>>>>>>>>>>>>> solicit >>>>>>>>>>>>>>>>> feedback from all stakeholders. I am very flexible on this >>>>>>>>>>>>>>>>> issue and would >>>>>>>>>>>>>>>>> like to hear the best issues both for and against removal of >>>>>>>>>>>>>>>>> Equality >>>>>>>>>>>>>>>>> Deletes. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Thanks everyone for your time, >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Russ Spitzer >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> *Jason Fine* >>>>>>>>>>>>>>>> Chief Software Architect >>>>>>>>>>>>>>>> ja...@upsolver.com | www.upsolver.com >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>