Steven, do you have any pointers? In particular, I am curious to learn where the state will be stored, whether it will be distributed, the lookup cost, how to incrementally maintain that index, etc.
- Anton пт, 1 лист. 2024 р. о 17:32 Steven Wu <stevenz...@gmail.com> пише: > > Add support for inverted indexes to reduce the cost of position lookup. > This is fairly tricky to implement for streaming use cases without an > external system. > > Anton, that is also what I was saying earlier. In Flink, the inverted > index of (key, committed data files) can be tracked in Flink state. > > On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi <aokolnyc...@gmail.com> > wrote: > >> I was a bit skeptical when we were adding equality deletes, but nothing >> beats their performance during writes. We have to find an alternative >> before deprecating. >> >> We are doing a lot of work to improve streaming, like reducing the cost >> of commits, enabling a large (potentially infinite) number of snapshots, >> changelog reads, and so on. It is a project goal to excel in streaming. >> >> I was going to focus on equality deletes after completing the DV work. I >> believe we have these options: >> >> - Revisit the existing design of equality deletes (e.g. add more >> restrictions, improve compaction, offer new writers). >> - Standardize on the view-based approach [1] to handle streaming upserts >> and CDC use cases, potentially making this part of the spec. >> - Add support for inverted indexes to reduce the cost of position lookup. >> This is fairly tricky to implement for streaming use cases without an >> external system. Our runtime filtering in Spark today is equivalent to >> looking up positions in an inverted index represented by another Iceberg >> table. That may still not be enough for some streaming use cases. >> >> [1] - https://www.tabular.io/blog/hello-world-of-cdc/ >> >> - Anton >> >> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield <emkornfi...@gmail.com> >> пише: >> >>> I agree that equality deletes have their place in streaming. I think >>> the ultimate decision here is how opinionated Iceberg wants to be on its >>> use-cases. If it really wants to stick to its origins of "slow moving >>> data", then removing equality deletes would be inline with this. I think >>> the other high level question is how much we allow for partially compatible >>> features (the row lineage use-case feature was explicitly approved >>> excluding equality deletes, and people seemed OK with it at the time. If >>> all features need to work together, then maybe we need to rethink the >>> design here so it can be forward compatible with equality deletes). >>> >>> I think one issue with equality deletes as stated in the spec is that >>> they are overly broad. I'd be interested if people have any use cases that >>> differ, but I think one way of narrowing (and probably a necessary building >>> block for building something better) the specification scope on equality >>> deletes is to focus on upsert/Streaming deletes. Two proposals in this >>> regard are: >>> >>> 1. Require that equality deletes can only correspond to unique >>> identifiers for the table. >>> 2. Consider requiring that for equality deletes on partitioned tables, >>> that the primary key must contain a partition column (I believe Flink at >>> least already does this). It is less clear to me that this would meet all >>> existing use-cases. But having this would allow for better incremental >>> data-structures, which could then be partition based. >>> >>> Narrow scope to unique identifiers would allow for further building >>> blocks already mentioned, like a secondary index (possible via LSM tree), >>> that would allow for better performance overall. >>> >>> I generally agree with the sentiment that we shouldn't deprecate them >>> until there is a viable replacement. With all due respect to my employer, >>> let's not fall into the Google trap [1] :) >>> >>> Cheers, >>> Micah >>> >>> [1] https://goomics.net/50/ >>> >>> >>> >>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo <alex...@starburstdata.com> >>> wrote: >>> >>>> Hey all, >>>> >>>> Just to throw my 2 cents in, I agree with Steven and others that we do >>>> need some kind of replacement before deprecating equality deletes. >>>> They certainly have their problems, and do significantly increase >>>> complexity as they are now, but the writing of position deletes is too >>>> expensive for certain pipelines. >>>> >>>> We've been investigating using equality deletes for some of our >>>> workloads at Starburst, the key advantage we were hoping to leverage is >>>> cheap, effectively random access lookup deletes. >>>> Say you have a UUID column that's unique in a table and want to delete >>>> a row by UUID. With position deletes each delete is expensive without an >>>> index on that UUID. >>>> With equality deletes each delete is cheap and while reads/compaction >>>> is expensive but when updates are frequent and reads are sporadic that's a >>>> reasonable tradeoff. >>>> >>>> Pretty much what Jason and Steven have already said. >>>> >>>> Maybe there are some incremental improvements on equality deletes or >>>> tips from similar systems that might alleviate some of their problems? >>>> >>>> - Alex Jo >>>> >>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu <stevenz...@gmail.com> >>>> wrote: >>>> >>>>> We probably all agree with the downside of equality deletes: it >>>>> postpones all the work on the read path. >>>>> >>>>> In theory, we can implement position deletes only in the Flink >>>>> streaming writer. It would require the tracking of last committed data >>>>> files per key, which can be stored in Flink state (checkpointed). This is >>>>> obviously quite expensive/challenging, but possible. >>>>> >>>>> I like to echo one benefit of equality deletes that Russel called out >>>>> in the original email. Equality deletes would never have conflicts. that >>>>> is >>>>> important for streaming writers (Flink, Kafka connect, ...) that >>>>> commit frequently (minutes or less). Assume Flink can write position >>>>> deletes only and commit every 2 minutes. The long-running nature of >>>>> streaming jobs can cause frequent commit conflicts with background delete >>>>> compaction jobs. >>>>> >>>>> Overall, the streaming upsert write is not a well solved problem in >>>>> Iceberg. This probably affects all streaming engines (Flink, Kafka >>>>> connect, >>>>> Spark streaming, ...). We need to come up with some better alternatives >>>>> before we can deprecate equality deletes. >>>>> >>>>> >>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer < >>>>> russell.spit...@gmail.com> wrote: >>>>> >>>>>> For users of Equality Deletes, what are the key benefits to Equality >>>>>> Deletes that you would like to preserve and could you please share some >>>>>> concrete examples of the queries you want to run (and the schemas and >>>>>> data >>>>>> sizes you would like to run them against) and the latencies that would be >>>>>> acceptable? >>>>>> >>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine >>>>>> <ja...@upsolver.com.invalid> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Representing Upsolver here, we also make use of Equality Deletes to >>>>>>> deliver high frequency low latency updates to our clients at scale. We >>>>>>> have >>>>>>> customers using them at scale and demonstrating the need and viability. >>>>>>> We >>>>>>> automate the process of converting them into positional deletes (or >>>>>>> fully >>>>>>> applying them) for more efficient engine queries in the background >>>>>>> giving >>>>>>> our users both low latency and good query performance. >>>>>>> >>>>>>> Equality Deletes were added since there isn't a good way to solve >>>>>>> frequent updates otherwise. It would require some sort of index keeping >>>>>>> track of every record in the table (by a predetermined PK) and >>>>>>> maintaining >>>>>>> such an index is a huge task that every tool interested in this would >>>>>>> need >>>>>>> to re-implement. It also becomes a bottleneck limiting table sizes. >>>>>>> >>>>>>> I don't think they should be removed without providing an >>>>>>> alternative. Positional Deletes have a different performance profile >>>>>>> inherently, requiring more upfront work proportional to the table size. >>>>>>> >>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré < >>>>>>> j...@nanthrax.net> wrote: >>>>>>> >>>>>>>> Hi Russell >>>>>>>> >>>>>>>> Thanks for the nice writeup and the proposal. >>>>>>>> >>>>>>>> I agree with your analysis, and I have the same feeling. However, I >>>>>>>> think there are more than Flink that write equality delete files. >>>>>>>> So, >>>>>>>> I agree to deprecate in V3, but maybe be more "flexible" about >>>>>>>> removal >>>>>>>> in V4 in order to give time to engines to update. >>>>>>>> I think that by deprecating equality deletes, we are clearly >>>>>>>> focusing >>>>>>>> on read performance and "consistency" (more than write). It's not >>>>>>>> necessarily a bad thing but the streaming platform and data >>>>>>>> ingestion >>>>>>>> platforms will be probably concerned about that (by using positional >>>>>>>> deletes, they will have to scan/read all datafiles to find the >>>>>>>> position, so painful). >>>>>>>> >>>>>>>> So, to summarize: >>>>>>>> 1. Agree to deprecate equality deletes, but -1 to commit any target >>>>>>>> for deletion before having a clear path for streaming platforms >>>>>>>> (Flink, Beam, ...) >>>>>>>> 2. In the meantime (during the deprecation period), I propose to >>>>>>>> explore possible improvements for streaming platforms (maybe >>>>>>>> finding a >>>>>>>> way to avoid full data files scan, ...) >>>>>>>> >>>>>>>> Thanks ! >>>>>>>> Regards >>>>>>>> JB >>>>>>>> >>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer >>>>>>>> <russell.spit...@gmail.com> wrote: >>>>>>>> > >>>>>>>> > Background: >>>>>>>> > >>>>>>>> > 1) Position Deletes >>>>>>>> > >>>>>>>> > >>>>>>>> > Writers determine what rows are deleted and mark them in a 1 for >>>>>>>> 1 representation. With delete vectors this means every data file has at >>>>>>>> most 1 delete vector that it is read in conjunction with to excise >>>>>>>> deleted >>>>>>>> rows. Reader overhead is more or less constant and is very predictable. >>>>>>>> > >>>>>>>> > >>>>>>>> > The main cost of this mode is that deletes must be determined at >>>>>>>> write time which is expensive and can be more difficult for conflict >>>>>>>> resolution >>>>>>>> > >>>>>>>> > 2) Equality Deletes >>>>>>>> > >>>>>>>> > Writers write out reference to what values are deleted (in a >>>>>>>> partition or globally). There can be an unlimited number of equality >>>>>>>> deletes and they all must be checked for every data file that is read. >>>>>>>> The >>>>>>>> cost of determining deleted rows is essentially given to the reader. >>>>>>>> > >>>>>>>> > Conflicts almost never happen since data files are not actually >>>>>>>> changed and there is almost no cost to the writer to generate these. >>>>>>>> Almost >>>>>>>> all costs related to equality deletes are passed on to the reader. >>>>>>>> > >>>>>>>> > Proposal: >>>>>>>> > >>>>>>>> > Equality deletes are, in my opinion, unsustainable and we should >>>>>>>> work on deprecating and removing them from the specification. At this >>>>>>>> time, >>>>>>>> I know of only one engine (Apache Flink) which produces these deletes >>>>>>>> but >>>>>>>> almost all engines have implementations to read them. The cost of >>>>>>>> implementing equality deletes on the read path is difficult and >>>>>>>> unpredictable in terms of memory usage and compute complexity. We’ve >>>>>>>> had >>>>>>>> suggestions of implementing rocksdb inorder to handle ever growing >>>>>>>> sets of >>>>>>>> equality deletes which in my opinion shows that we are going down the >>>>>>>> wrong >>>>>>>> path. >>>>>>>> > >>>>>>>> > Outside of performance, Equality deletes are also difficult to >>>>>>>> use in conjunction with many other features. For example, any features >>>>>>>> requiring CDC or Row lineage are basically impossible when equality >>>>>>>> deletes >>>>>>>> are in use. When Equality deletes are present, the state of the table >>>>>>>> can >>>>>>>> only be determined with a full scan making it difficult to update >>>>>>>> differential structures. This means materialized views or indexes need >>>>>>>> to >>>>>>>> essentially be fully rebuilt whenever an equality delete is added to >>>>>>>> the >>>>>>>> table. >>>>>>>> > >>>>>>>> > Equality deletes essentially remove complexity from the write >>>>>>>> side but then add what I believe is an unacceptable level of >>>>>>>> complexity to >>>>>>>> the read side. >>>>>>>> > >>>>>>>> > Because of this I suggest we deprecate Equality Deletes in V3 and >>>>>>>> slate them for full removal from the Iceberg Spec in V4. >>>>>>>> > >>>>>>>> > I know this is a big change and compatibility breakage so I would >>>>>>>> like to introduce this idea to the community and solicit feedback from >>>>>>>> all >>>>>>>> stakeholders. I am very flexible on this issue and would like to hear >>>>>>>> the >>>>>>>> best issues both for and against removal of Equality Deletes. >>>>>>>> > >>>>>>>> > Thanks everyone for your time, >>>>>>>> > >>>>>>>> > Russ Spitzer >>>>>>>> > >>>>>>>> > >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> *Jason Fine* >>>>>>> Chief Software Architect >>>>>>> ja...@upsolver.com | www.upsolver.com >>>>>>> >>>>>>