> > - ok for deprecate equality deletes > - not ok to remove it
@JB: I don't think it is a good idea to use deprecated functionality in the new feature development. Hence, my specific question was about kafka connect upsert operation. @Manu: I meant the delta writers for kafka connect Iceberg sink (which in turn used for upsetting the CDC records) https://github.com/apache/iceberg/issues/10842 - Ajantha On Tue, Nov 19, 2024 at 3:08 PM Manu Zhang <owenzhang1...@gmail.com> wrote: > I second Anton's proposal to standardize on a view-based approach to > handle CDC cases. > Actually, it's already been explored in detail[1] by Jack before. > > [1] Improving Change Data Capture Use Case for Apache Iceberg > <https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit?tab=t.0#heading=h.94xnx4qg3bnt> > > > On Tue, Nov 19, 2024 at 4:16 PM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> My proposal is the following (already expressed): >> - ok for deprecate equality deletes >> - not ok to remove it >> - work on position deletes improvements to address streaming use cases. I >> think we should explore different approaches. Personally I think a possible >> approach would be to find index way to data files to avoid full scan to >> find row position. >> >> My $0.01 :) >> >> Regards >> JB >> >> Le mar. 19 nov. 2024 à 07:53, Ajantha Bhat <ajanthab...@gmail.com> a >> écrit : >> >>> Hi, What's the conclusion on this thread? >>> >>> Users are looking for Upsert (CDC) support for OSS Iceberg kafka connect >>> sink. >>> We only support appends at the moment. Can we go ahead and implement the >>> upserts using equality deletes? >>> >>> >>> - Ajantha >>> >>> On Sun, Nov 10, 2024 at 11:56 AM Vignesh <vignesh.v...@gmail.com> wrote: >>> >>>> Hi, >>>> I am reading about iceberg and am quite new to this. >>>> This puffin would be an index from key to data file. Other use cases of >>>> Puffin, such as statistics are at a per file level if I understand >>>> correctly. >>>> >>>> Where would the puffin about key->data file be stored? It is a property >>>> of the entire table. >>>> >>>> Thanks, >>>> Vignesh. >>>> >>>> >>>> On Sat, Nov 9, 2024 at 2:17 AM Shani Elharrar >>>> <sh...@upsolver.com.invalid> wrote: >>>> >>>>> JB, this is what we do, we write Equality Deletes and periodically >>>>> convert them to Positional Deletes. >>>>> >>>>> We could probably index the keys, maybe partially index using bloom >>>>> filters, the best would be to put those bloom filters inside puffin. >>>>> >>>>> Shani. >>>>> >>>>> On 9 Nov 2024, at 11:11, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: >>>>> >>>>> >>>>> Hi, >>>>> >>>>> I agree with Peter here, and I would say that it would be an issue for >>>>> multi-engine support. >>>>> >>>>> I think, as I already mentioned with others, we should explore an >>>>> alternative. >>>>> As the main issue is the datafile scan in streaming context, maybe we >>>>> could find a way to "index"/correlate for positional deletes with limited >>>>> scanning. >>>>> I will think again about that :) >>>>> >>>>> Regards >>>>> JB >>>>> >>>>> On Sat, Nov 9, 2024 at 6:48 AM Péter Váry <peter.vary.apa...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Imran, >>>>>> >>>>>> I don't think it's a good idea to start creating multiple types of >>>>>> Iceberg tables. Iceberg's main selling point is compatibility between >>>>>> engines. If we don't have readers and writers for all types of tables, >>>>>> then >>>>>> we remove compatibility from the equation and engine specific formats >>>>>> always win. OTOH, if we write readers and writers for all types of tables >>>>>> then we are back on square one. >>>>>> >>>>>> Identifier fields are a table schema concept and used in many cases >>>>>> during query planning and execution. This is why they are defined as part >>>>>> of the SQL spec, and this is why Iceberg defines them as well. One use >>>>>> case >>>>>> is where they can be used to merge deletes (independently of how they are >>>>>> manifested) and subsequent inserts, into updates. >>>>>> >>>>>> Flink SQL doesn't allow creating tables with partition transforms, so >>>>>> no new table could be created by Flink SQL using transforms, but tables >>>>>> created by other engines could still be used (both read an write). Also >>>>>> you >>>>>> can create such tables in Flink using the Java API. >>>>>> >>>>>> Requiring partition columns be part of the identifier fields is >>>>>> coming from the practical consideration, that you want to limit the scope >>>>>> of the equality deletes as much as possible. Otherwise all of the >>>>>> equality >>>>>> deletes should be table global, and they should be read by every reader. >>>>>> We >>>>>> could write those, we just decided that we don't want to allow the user >>>>>> to >>>>>> do this, as it is most cases a bad idea. >>>>>> >>>>>> I hope this helps, >>>>>> Peter >>>>>> >>>>>> On Fri, Nov 8, 2024, 22:01 Imran Rashid <iras...@cloudera.com.invalid> >>>>>> wrote: >>>>>> >>>>>>> I'm not down in the weeds at all myself on implementation details, >>>>>>> so forgive me if I'm wrong about the details here. >>>>>>> >>>>>>> I can see all the viewpoints -- both that equality deletes enable >>>>>>> some use cases, but also make others far more difficult. What >>>>>>> surprised me >>>>>>> the most is that Iceberg does not provide a way to distinguish these two >>>>>>> table "types". >>>>>>> >>>>>>> At first, I thought the presence of an identifier-field ( >>>>>>> https://iceberg.apache.org/spec/#identifier-field-ids) indicated >>>>>>> that the table was a target for equality deletes. But, then it turns >>>>>>> out >>>>>>> identifier-fields are also useful for changelog views even without >>>>>>> equality >>>>>>> deletes -- IIUC, they show that a delete + insert should actually be >>>>>>> interpreted as an update in changelog view. >>>>>>> >>>>>>> To be perfectly honest, I'm confused about all of these details -- >>>>>>> from my read, the spec does not indicate this relationship between >>>>>>> identifier-fields and equality_ids in equality delete files ( >>>>>>> https://iceberg.apache.org/spec/#equality-delete-files), but I >>>>>>> think that is the way Flink works. Flink itself seems to have even more >>>>>>> limitations -- no partition transforms are allowed, and all partition >>>>>>> columns must be a subset of the identifier fields. Is that just a Flink >>>>>>> limitation, or is that the intended behavior in the spec? (Or maybe >>>>>>> user-error on my part?) Those seem like very reasonable limitations, >>>>>>> from >>>>>>> an implementation point-of-view. But OTOH, as a user, this seems to be >>>>>>> directly contrary to some of the promises of Iceberg. >>>>>>> >>>>>>> Its easy to see if a table already has equality deletes in it, by >>>>>>> looking at the metadata. But is there any way to indicate that a table >>>>>>> (or >>>>>>> branch of a table) _must not_ have equality deletes added to it? >>>>>>> >>>>>>> If that were possible, it seems like we could support both use >>>>>>> cases. We could continue to optimize for the streaming ingestion use >>>>>>> cases >>>>>>> using equality deletes. But we could also build more optimizations into >>>>>>> the "non-streaming-ingestion" branches. And we could document the >>>>>>> tradeoff >>>>>>> so it is much clearer to end users. >>>>>>> >>>>>>> To maintain compatibility, I suppose that the change would be that >>>>>>> equality deletes continue to be allowed by default, but we'd add a new >>>>>>> field to indicate that for some tables (or branches of a table), >>>>>>> equality >>>>>>> deletes would not be allowed. And it would be an error for an engine to >>>>>>> make an update which added an equality delete to such a table. >>>>>>> >>>>>>> Maybe that change would even be possible in V3. >>>>>>> >>>>>>> And if all the performance improvements to equality deletes make >>>>>>> this a moot point, we could drop the field in v4. But it seems like a >>>>>>> mistake to both limit the non-streaming use-case AND have confusing >>>>>>> limitations for the end-user in the meantime. >>>>>>> >>>>>>> I would happily be corrected about my understanding of all of the >>>>>>> above. >>>>>>> >>>>>>> thanks! >>>>>>> Imran >>>>>>> >>>>>>> On Tue, Nov 5, 2024 at 9:16 AM Bryan Keller <brya...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I also feel we should keep equality deletes until we have an >>>>>>>> alternative solution for streaming updates/deletes. >>>>>>>> >>>>>>>> -Bryan >>>>>>>> >>>>>>>> On Nov 4, 2024, at 8:33 AM, Péter Váry <peter.vary.apa...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Well, it seems like I'm a little late, so most of the arguments are >>>>>>>> voiced. >>>>>>>> >>>>>>>> I agree that we should not deprecate the equality deletes until we >>>>>>>> have a replacement feature. >>>>>>>> I think one of the big advantages of Iceberg is that it supports >>>>>>>> batch processing and streaming ingestion too. >>>>>>>> For streaming ingestion we need a way to update existing data in a >>>>>>>> performant way, but restricting deletes for the primary keys seems like >>>>>>>> enough from the streaming perspective. >>>>>>>> >>>>>>>> Equality deletes allow a very wide range of applications, which we >>>>>>>> might be able to narrow down a bit, but still keep useful. So if we >>>>>>>> want to >>>>>>>> go down this road, we need to start collecting the requirements. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Peter >>>>>>>> >>>>>>>> Shani Elharrar <sh...@upsolver.com.invalid> ezt írta (időpont: >>>>>>>> 2024. nov. 1., P, 19:22): >>>>>>>> >>>>>>>>> I understand how it makes sense for batch jobs, but it damages >>>>>>>>> stream jobs, using equality deletes works much better for streaming >>>>>>>>> (which >>>>>>>>> have a strict SLA for delays), and in order to decrease the >>>>>>>>> performance >>>>>>>>> penalty - systems can rewrite the equality deletes to positional >>>>>>>>> deletes. >>>>>>>>> >>>>>>>>> Shani. >>>>>>>>> >>>>>>>>> On 1 Nov 2024, at 20:06, Steven Wu <stevenz...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Fundamentally, it is very difficult to write position deletes with >>>>>>>>> concurrent writers and conflicts for batch jobs too, as the inverted >>>>>>>>> index >>>>>>>>> may become invalid/stale. >>>>>>>>> >>>>>>>>> The position deletes are created during the write phase. But >>>>>>>>> conflicts are only detected at the commit stage. I assume the batch >>>>>>>>> job >>>>>>>>> should fail in this case. >>>>>>>>> >>>>>>>>> On Fri, Nov 1, 2024 at 10:57 AM Steven Wu <stevenz...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Shani, >>>>>>>>>> >>>>>>>>>> That is a good point. It is certainly a limitation for the Flink >>>>>>>>>> job to track the inverted index internally (which is what I had in >>>>>>>>>> mind). >>>>>>>>>> It can't be shared/synchronized with other Flink jobs or other >>>>>>>>>> engines >>>>>>>>>> writing to the same table. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Steven >>>>>>>>>> >>>>>>>>>> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar >>>>>>>>>> <sh...@upsolver.com.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> Even if Flink can create this state, it would have to be >>>>>>>>>>> maintained against the Iceberg table, we wouldn't like duplicates >>>>>>>>>>> (keys) if >>>>>>>>>>> other systems / users update the table (e.g manual insert / updates >>>>>>>>>>> using >>>>>>>>>>> DML). >>>>>>>>>>> >>>>>>>>>>> Shani. >>>>>>>>>>> >>>>>>>>>>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> > Add support for inverted indexes to reduce the cost of >>>>>>>>>>> position lookup. This is fairly tricky to implement for streaming >>>>>>>>>>> use cases >>>>>>>>>>> without an external system. >>>>>>>>>>> >>>>>>>>>>> Anton, that is also what I was saying earlier. In Flink, the >>>>>>>>>>> inverted index of (key, committed data files) can be tracked in >>>>>>>>>>> Flink state. >>>>>>>>>>> >>>>>>>>>>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi < >>>>>>>>>>> aokolnyc...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> I was a bit skeptical when we were adding equality deletes, but >>>>>>>>>>>> nothing beats their performance during writes. We have to find an >>>>>>>>>>>> alternative before deprecating. >>>>>>>>>>>> >>>>>>>>>>>> We are doing a lot of work to improve streaming, like reducing >>>>>>>>>>>> the cost of commits, enabling a large (potentially infinite) >>>>>>>>>>>> number of >>>>>>>>>>>> snapshots, changelog reads, and so on. It is a project goal to >>>>>>>>>>>> excel in >>>>>>>>>>>> streaming. >>>>>>>>>>>> >>>>>>>>>>>> I was going to focus on equality deletes after completing the >>>>>>>>>>>> DV work. I believe we have these options: >>>>>>>>>>>> >>>>>>>>>>>> - Revisit the existing design of equality deletes (e.g. add >>>>>>>>>>>> more restrictions, improve compaction, offer new writers). >>>>>>>>>>>> - Standardize on the view-based approach [1] to handle >>>>>>>>>>>> streaming upserts and CDC use cases, potentially making this part >>>>>>>>>>>> of the >>>>>>>>>>>> spec. >>>>>>>>>>>> - Add support for inverted indexes to reduce the cost of >>>>>>>>>>>> position lookup. This is fairly tricky to implement for streaming >>>>>>>>>>>> use cases >>>>>>>>>>>> without an external system. Our runtime filtering in Spark today is >>>>>>>>>>>> equivalent to looking up positions in an inverted index >>>>>>>>>>>> represented by >>>>>>>>>>>> another Iceberg table. That may still not be enough for some >>>>>>>>>>>> streaming use >>>>>>>>>>>> cases. >>>>>>>>>>>> >>>>>>>>>>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/ >>>>>>>>>>>> >>>>>>>>>>>> - Anton >>>>>>>>>>>> >>>>>>>>>>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield < >>>>>>>>>>>> emkornfi...@gmail.com> пише: >>>>>>>>>>>> >>>>>>>>>>>>> I agree that equality deletes have their place in streaming. >>>>>>>>>>>>> I think the ultimate decision here is how opinionated Iceberg >>>>>>>>>>>>> wants to be >>>>>>>>>>>>> on its use-cases. If it really wants to stick to its origins of >>>>>>>>>>>>> "slow >>>>>>>>>>>>> moving data", then removing equality deletes would be inline with >>>>>>>>>>>>> this. I >>>>>>>>>>>>> think the other high level question is how much we allow for >>>>>>>>>>>>> partially >>>>>>>>>>>>> compatible features (the row lineage use-case feature was >>>>>>>>>>>>> explicitly >>>>>>>>>>>>> approved excluding equality deletes, and people seemed OK with it >>>>>>>>>>>>> at the >>>>>>>>>>>>> time. If all features need to work together, then maybe we need >>>>>>>>>>>>> to rethink >>>>>>>>>>>>> the design here so it can be forward compatible with equality >>>>>>>>>>>>> deletes). >>>>>>>>>>>>> >>>>>>>>>>>>> I think one issue with equality deletes as stated in the spec >>>>>>>>>>>>> is that they are overly broad. I'd be interested if people have >>>>>>>>>>>>> any use >>>>>>>>>>>>> cases that differ, but I think one way of narrowing (and probably >>>>>>>>>>>>> a >>>>>>>>>>>>> necessary building block for building something better) the >>>>>>>>>>>>> specification >>>>>>>>>>>>> scope on equality deletes is to focus on upsert/Streaming >>>>>>>>>>>>> deletes. Two >>>>>>>>>>>>> proposals in this regard are: >>>>>>>>>>>>> >>>>>>>>>>>>> 1. Require that equality deletes can only correspond to >>>>>>>>>>>>> unique identifiers for the table. >>>>>>>>>>>>> 2. Consider requiring that for equality deletes on >>>>>>>>>>>>> partitioned tables, that the primary key must contain a partition >>>>>>>>>>>>> column (I >>>>>>>>>>>>> believe Flink at least already does this). It is less clear to >>>>>>>>>>>>> me that >>>>>>>>>>>>> this would meet all existing use-cases. But having this would >>>>>>>>>>>>> allow for >>>>>>>>>>>>> better incremental data-structures, which could then be partition >>>>>>>>>>>>> based. >>>>>>>>>>>>> >>>>>>>>>>>>> Narrow scope to unique identifiers would allow for further >>>>>>>>>>>>> building blocks already mentioned, like a secondary index >>>>>>>>>>>>> (possible via LSM >>>>>>>>>>>>> tree), that would allow for better performance overall. >>>>>>>>>>>>> >>>>>>>>>>>>> I generally agree with the sentiment that we shouldn't >>>>>>>>>>>>> deprecate them until there is a viable replacement. With all due >>>>>>>>>>>>> respect >>>>>>>>>>>>> to my employer, let's not fall into the Google trap [1] :) >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Micah >>>>>>>>>>>>> >>>>>>>>>>>>> [1] https://goomics.net/50/ >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo < >>>>>>>>>>>>> alex...@starburstdata.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hey all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Just to throw my 2 cents in, I agree with Steven and others >>>>>>>>>>>>>> that we do need some kind of replacement before deprecating >>>>>>>>>>>>>> equality >>>>>>>>>>>>>> deletes. >>>>>>>>>>>>>> They certainly have their problems, and do significantly >>>>>>>>>>>>>> increase complexity as they are now, but the writing of position >>>>>>>>>>>>>> deletes is >>>>>>>>>>>>>> too expensive for certain pipelines. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We've been investigating using equality deletes for some of >>>>>>>>>>>>>> our workloads at Starburst, the key advantage we were hoping to >>>>>>>>>>>>>> leverage is >>>>>>>>>>>>>> cheap, effectively random access lookup deletes. >>>>>>>>>>>>>> Say you have a UUID column that's unique in a table and want >>>>>>>>>>>>>> to delete a row by UUID. With position deletes each delete is >>>>>>>>>>>>>> expensive >>>>>>>>>>>>>> without an index on that UUID. >>>>>>>>>>>>>> With equality deletes each delete is cheap and while >>>>>>>>>>>>>> reads/compaction is expensive but when updates are frequent and >>>>>>>>>>>>>> reads are >>>>>>>>>>>>>> sporadic that's a reasonable tradeoff. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Pretty much what Jason and Steven have already said. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Maybe there are some incremental improvements on equality >>>>>>>>>>>>>> deletes or tips from similar systems that might alleviate some >>>>>>>>>>>>>> of their >>>>>>>>>>>>>> problems? >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Alex Jo >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu < >>>>>>>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> We probably all agree with the downside of equality deletes: >>>>>>>>>>>>>>> it postpones all the work on the read path. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In theory, we can implement position deletes only in the >>>>>>>>>>>>>>> Flink streaming writer. It would require the tracking of last >>>>>>>>>>>>>>> committed >>>>>>>>>>>>>>> data files per key, which can be stored in Flink state >>>>>>>>>>>>>>> (checkpointed). This >>>>>>>>>>>>>>> is obviously quite expensive/challenging, but possible. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I like to echo one benefit of equality deletes that Russel >>>>>>>>>>>>>>> called out in the original email. Equality deletes would never >>>>>>>>>>>>>>> have conflicts. that is important for streaming writers (Flink, >>>>>>>>>>>>>>> Kafka >>>>>>>>>>>>>>> connect, ...) that commit frequently (minutes or less). Assume >>>>>>>>>>>>>>> Flink can >>>>>>>>>>>>>>> write position deletes only and commit every 2 minutes. The >>>>>>>>>>>>>>> long-running >>>>>>>>>>>>>>> nature of streaming jobs can cause frequent commit conflicts >>>>>>>>>>>>>>> with >>>>>>>>>>>>>>> background delete compaction jobs. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Overall, the streaming upsert write is not a well solved >>>>>>>>>>>>>>> problem in Iceberg. This probably affects all streaming engines >>>>>>>>>>>>>>> (Flink, >>>>>>>>>>>>>>> Kafka connect, Spark streaming, ...). We need to come up with >>>>>>>>>>>>>>> some better >>>>>>>>>>>>>>> alternatives before we can deprecate equality deletes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer < >>>>>>>>>>>>>>> russell.spit...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For users of Equality Deletes, what are the key benefits to >>>>>>>>>>>>>>>> Equality Deletes that you would like to preserve and could you >>>>>>>>>>>>>>>> please share >>>>>>>>>>>>>>>> some concrete examples of the queries you want to run (and the >>>>>>>>>>>>>>>> schemas and >>>>>>>>>>>>>>>> data sizes you would like to run them against) and the >>>>>>>>>>>>>>>> latencies that would >>>>>>>>>>>>>>>> be acceptable? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine >>>>>>>>>>>>>>>> <ja...@upsolver.com.invalid> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Representing Upsolver here, we also make use of Equality >>>>>>>>>>>>>>>>> Deletes to deliver high frequency low latency updates to our >>>>>>>>>>>>>>>>> clients at >>>>>>>>>>>>>>>>> scale. We have customers using them at scale and >>>>>>>>>>>>>>>>> demonstrating the need and >>>>>>>>>>>>>>>>> viability. We automate the process of converting them into >>>>>>>>>>>>>>>>> positional >>>>>>>>>>>>>>>>> deletes (or fully applying them) for more efficient engine >>>>>>>>>>>>>>>>> queries in the >>>>>>>>>>>>>>>>> background giving our users both low latency and good query >>>>>>>>>>>>>>>>> performance. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Equality Deletes were added since there isn't a good way >>>>>>>>>>>>>>>>> to solve frequent updates otherwise. It would require some >>>>>>>>>>>>>>>>> sort of index >>>>>>>>>>>>>>>>> keeping track of every record in the table (by a >>>>>>>>>>>>>>>>> predetermined PK) and >>>>>>>>>>>>>>>>> maintaining such an index is a huge task that every tool >>>>>>>>>>>>>>>>> interested in this >>>>>>>>>>>>>>>>> would need to re-implement. It also becomes a bottleneck >>>>>>>>>>>>>>>>> limiting table >>>>>>>>>>>>>>>>> sizes. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I don't think they should be removed without providing an >>>>>>>>>>>>>>>>> alternative. Positional Deletes have a different performance >>>>>>>>>>>>>>>>> profile >>>>>>>>>>>>>>>>> inherently, requiring more upfront work proportional to the >>>>>>>>>>>>>>>>> table size. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré < >>>>>>>>>>>>>>>>> j...@nanthrax.net> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Russell >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for the nice writeup and the proposal. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I agree with your analysis, and I have the same feeling. >>>>>>>>>>>>>>>>>> However, I >>>>>>>>>>>>>>>>>> think there are more than Flink that write equality >>>>>>>>>>>>>>>>>> delete files. So, >>>>>>>>>>>>>>>>>> I agree to deprecate in V3, but maybe be more "flexible" >>>>>>>>>>>>>>>>>> about removal >>>>>>>>>>>>>>>>>> in V4 in order to give time to engines to update. >>>>>>>>>>>>>>>>>> I think that by deprecating equality deletes, we are >>>>>>>>>>>>>>>>>> clearly focusing >>>>>>>>>>>>>>>>>> on read performance and "consistency" (more than write). >>>>>>>>>>>>>>>>>> It's not >>>>>>>>>>>>>>>>>> necessarily a bad thing but the streaming platform and >>>>>>>>>>>>>>>>>> data ingestion >>>>>>>>>>>>>>>>>> platforms will be probably concerned about that (by using >>>>>>>>>>>>>>>>>> positional >>>>>>>>>>>>>>>>>> deletes, they will have to scan/read all datafiles to >>>>>>>>>>>>>>>>>> find the >>>>>>>>>>>>>>>>>> position, so painful). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> So, to summarize: >>>>>>>>>>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to commit >>>>>>>>>>>>>>>>>> any target >>>>>>>>>>>>>>>>>> for deletion before having a clear path for streaming >>>>>>>>>>>>>>>>>> platforms >>>>>>>>>>>>>>>>>> (Flink, Beam, ...) >>>>>>>>>>>>>>>>>> 2. In the meantime (during the deprecation period), I >>>>>>>>>>>>>>>>>> propose to >>>>>>>>>>>>>>>>>> explore possible improvements for streaming platforms >>>>>>>>>>>>>>>>>> (maybe finding a >>>>>>>>>>>>>>>>>> way to avoid full data files scan, ...) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks ! >>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>> JB >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer >>>>>>>>>>>>>>>>>> <russell.spit...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Background: >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > 1) Position Deletes >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Writers determine what rows are deleted and mark them >>>>>>>>>>>>>>>>>> in a 1 for 1 representation. With delete vectors this means >>>>>>>>>>>>>>>>>> every data file >>>>>>>>>>>>>>>>>> has at most 1 delete vector that it is read in conjunction >>>>>>>>>>>>>>>>>> with to excise >>>>>>>>>>>>>>>>>> deleted rows. Reader overhead is more or less constant and >>>>>>>>>>>>>>>>>> is very >>>>>>>>>>>>>>>>>> predictable. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > The main cost of this mode is that deletes must be >>>>>>>>>>>>>>>>>> determined at write time which is expensive and can be more >>>>>>>>>>>>>>>>>> difficult for >>>>>>>>>>>>>>>>>> conflict resolution >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > 2) Equality Deletes >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Writers write out reference to what values are deleted >>>>>>>>>>>>>>>>>> (in a partition or globally). There can be an unlimited >>>>>>>>>>>>>>>>>> number of equality >>>>>>>>>>>>>>>>>> deletes and they all must be checked for every data file >>>>>>>>>>>>>>>>>> that is read. The >>>>>>>>>>>>>>>>>> cost of determining deleted rows is essentially given to the >>>>>>>>>>>>>>>>>> reader. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Conflicts almost never happen since data files are not >>>>>>>>>>>>>>>>>> actually changed and there is almost no cost to the writer >>>>>>>>>>>>>>>>>> to generate >>>>>>>>>>>>>>>>>> these. Almost all costs related to equality deletes are >>>>>>>>>>>>>>>>>> passed on to the >>>>>>>>>>>>>>>>>> reader. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Proposal: >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Equality deletes are, in my opinion, unsustainable and >>>>>>>>>>>>>>>>>> we should work on deprecating and removing them from the >>>>>>>>>>>>>>>>>> specification. At >>>>>>>>>>>>>>>>>> this time, I know of only one engine (Apache Flink) which >>>>>>>>>>>>>>>>>> produces these >>>>>>>>>>>>>>>>>> deletes but almost all engines have implementations to read >>>>>>>>>>>>>>>>>> them. The cost >>>>>>>>>>>>>>>>>> of implementing equality deletes on the read path is >>>>>>>>>>>>>>>>>> difficult and >>>>>>>>>>>>>>>>>> unpredictable in terms of memory usage and compute >>>>>>>>>>>>>>>>>> complexity. We’ve had >>>>>>>>>>>>>>>>>> suggestions of implementing rocksdb inorder to handle ever >>>>>>>>>>>>>>>>>> growing sets of >>>>>>>>>>>>>>>>>> equality deletes which in my opinion shows that we are going >>>>>>>>>>>>>>>>>> down the wrong >>>>>>>>>>>>>>>>>> path. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Outside of performance, Equality deletes are also >>>>>>>>>>>>>>>>>> difficult to use in conjunction with many other features. >>>>>>>>>>>>>>>>>> For example, any >>>>>>>>>>>>>>>>>> features requiring CDC or Row lineage are basically >>>>>>>>>>>>>>>>>> impossible when >>>>>>>>>>>>>>>>>> equality deletes are in use. When Equality deletes are >>>>>>>>>>>>>>>>>> present, the state >>>>>>>>>>>>>>>>>> of the table can only be determined with a full scan making >>>>>>>>>>>>>>>>>> it difficult to >>>>>>>>>>>>>>>>>> update differential structures. This means materialized >>>>>>>>>>>>>>>>>> views or indexes >>>>>>>>>>>>>>>>>> need to essentially be fully rebuilt whenever an equality >>>>>>>>>>>>>>>>>> delete is added >>>>>>>>>>>>>>>>>> to the table. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Equality deletes essentially remove complexity from the >>>>>>>>>>>>>>>>>> write side but then add what I believe is an unacceptable >>>>>>>>>>>>>>>>>> level of >>>>>>>>>>>>>>>>>> complexity to the read side. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Because of this I suggest we deprecate Equality Deletes >>>>>>>>>>>>>>>>>> in V3 and slate them for full removal from the Iceberg Spec >>>>>>>>>>>>>>>>>> in V4. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > I know this is a big change and compatibility breakage >>>>>>>>>>>>>>>>>> so I would like to introduce this idea to the community and >>>>>>>>>>>>>>>>>> solicit >>>>>>>>>>>>>>>>>> feedback from all stakeholders. I am very flexible on this >>>>>>>>>>>>>>>>>> issue and would >>>>>>>>>>>>>>>>>> like to hear the best issues both for and against removal of >>>>>>>>>>>>>>>>>> Equality >>>>>>>>>>>>>>>>>> Deletes. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Thanks everyone for your time, >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Russ Spitzer >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> *Jason Fine* >>>>>>>>>>>>>>>>> Chief Software Architect >>>>>>>>>>>>>>>>> ja...@upsolver.com | www.upsolver.com >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>