Hi,
I am reading about iceberg and am quite new to this.
This puffin would be an index from key to data file. Other use cases of
Puffin, such as statistics are at a per file level if I understand
correctly.

Where would the puffin about key->data file be stored? It is a property of
the entire table.

Thanks,
Vignesh.


On Sat, Nov 9, 2024 at 2:17 AM Shani Elharrar <sh...@upsolver.com.invalid>
wrote:

> JB, this is what we do, we write Equality Deletes and periodically convert
> them to Positional Deletes.
>
> We could probably index the keys, maybe partially index using bloom
> filters, the best would be to put those bloom filters inside puffin.
>
> Shani.
>
> On 9 Nov 2024, at 11:11, Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
>
> 
> Hi,
>
> I agree with Peter here, and I would say that it would be an issue for
> multi-engine support.
>
> I think, as I already mentioned with others, we should explore an
> alternative.
> As the main issue is the datafile scan in streaming context, maybe we
> could find a way to "index"/correlate for positional deletes with limited
> scanning.
> I will think again about that :)
>
> Regards
> JB
>
> On Sat, Nov 9, 2024 at 6:48 AM Péter Váry <peter.vary.apa...@gmail.com>
> wrote:
>
>> Hi Imran,
>>
>> I don't think it's a good idea to start creating multiple types of
>> Iceberg tables. Iceberg's main selling point is compatibility between
>> engines. If we don't have readers and writers for all types of tables, then
>> we remove compatibility from the equation and engine specific formats
>> always win. OTOH, if we write readers and writers for all types of tables
>> then we are back on square one.
>>
>> Identifier fields are a table schema concept and used in many cases
>> during query planning and execution. This is why they are defined as part
>> of the SQL spec, and this is why Iceberg defines them as well. One use case
>> is where they can be used to merge deletes (independently of how they are
>> manifested) and subsequent inserts, into updates.
>>
>> Flink SQL doesn't allow creating tables with partition transforms, so no
>> new table could be created by Flink SQL using transforms, but tables
>> created by other engines could still be used (both read an write). Also you
>> can create such tables in Flink using the Java API.
>>
>> Requiring partition columns be part of the identifier fields is coming
>> from the practical consideration, that you want to limit the scope of the
>> equality deletes as much as possible. Otherwise all of the equality deletes
>> should be table global, and they should be read by every reader. We could
>> write those, we just decided that we don't want to allow the user to do
>> this, as it is most cases a bad idea.
>>
>> I hope this helps,
>> Peter
>>
>> On Fri, Nov 8, 2024, 22:01 Imran Rashid <iras...@cloudera.com.invalid>
>> wrote:
>>
>>> I'm not down in the weeds at all myself on implementation details, so
>>> forgive me if I'm wrong about the details here.
>>>
>>> I can see all the viewpoints -- both that equality deletes enable some
>>> use cases, but also make others far more difficult.  What surprised me the
>>> most is that Iceberg does not provide a way to distinguish these two table
>>> "types".
>>>
>>> At first, I thought the presence of an identifier-field (
>>> https://iceberg.apache.org/spec/#identifier-field-ids) indicated that
>>> the table was a target for equality deletes.  But, then it turns out
>>> identifier-fields are also useful for changelog views even without equality
>>> deletes -- IIUC, they show that a delete + insert should actually be
>>> interpreted as an update in changelog view.
>>>
>>> To be perfectly honest, I'm confused about all of these details -- from
>>> my read, the spec does not indicate this relationship between
>>> identifier-fields and equality_ids in equality delete files (
>>> https://iceberg.apache.org/spec/#equality-delete-files), but I think
>>> that is the way Flink works.  Flink itself seems to have even more
>>> limitations -- no partition transforms are allowed, and all partition
>>> columns must be a subset of the identifier fields.  Is that just a Flink
>>> limitation, or is that the intended behavior in the spec?  (Or maybe
>>> user-error on my part?)  Those seem like very reasonable limitations, from
>>> an implementation point-of-view.  But OTOH, as a user, this seems to be
>>> directly contrary to some of the promises of Iceberg.
>>>
>>> Its easy to see if a table already has equality deletes in it, by
>>> looking at the metadata.  But is there any way to indicate that a table (or
>>> branch of a table) _must not_ have equality deletes added to it?
>>>
>>> If that were possible, it seems like we could support both use cases.
>>> We could continue to optimize for the streaming ingestion use cases using
>>> equality deletes.  But we could also build more optimizations into the
>>> "non-streaming-ingestion" branches.  And we could document the tradeoff so
>>> it is much clearer to end users.
>>>
>>> To maintain compatibility, I suppose that the change would be that
>>> equality deletes continue to be allowed by default, but we'd add a new
>>> field to indicate that for some tables (or branches of a table), equality
>>> deletes would not be allowed.  And it would be an error for an engine to
>>> make an update which added an equality delete to such a table.
>>>
>>> Maybe that change would even be possible in V3.
>>>
>>> And if all the performance improvements to equality deletes make this a
>>> moot point, we could drop the field in v4.  But it seems like a mistake to
>>> both limit the non-streaming use-case AND have confusing limitations for
>>> the end-user in the meantime.
>>>
>>> I would happily be corrected about my understanding of all of the above.
>>>
>>> thanks!
>>> Imran
>>>
>>> On Tue, Nov 5, 2024 at 9:16 AM Bryan Keller <brya...@gmail.com> wrote:
>>>
>>>> I also feel we should keep equality deletes until we have an
>>>> alternative solution for streaming updates/deletes.
>>>>
>>>> -Bryan
>>>>
>>>> On Nov 4, 2024, at 8:33 AM, Péter Váry <peter.vary.apa...@gmail.com>
>>>> wrote:
>>>>
>>>> Well, it seems like I'm a little late, so most of the arguments are
>>>> voiced.
>>>>
>>>> I agree that we should not deprecate the equality deletes until we have
>>>> a replacement feature.
>>>> I think one of the big advantages of Iceberg is that it supports batch
>>>> processing and streaming ingestion too.
>>>> For streaming ingestion we need a way to update existing data in a
>>>> performant way, but restricting deletes for the primary keys seems like
>>>> enough from the streaming perspective.
>>>>
>>>> Equality deletes allow a very wide range of applications, which we
>>>> might be able to narrow down a bit, but still keep useful. So if we want to
>>>> go down this road, we need to start collecting the requirements.
>>>>
>>>> Thanks,
>>>> Peter
>>>>
>>>> Shani Elharrar <sh...@upsolver.com.invalid> ezt írta (időpont: 2024.
>>>> nov. 1., P, 19:22):
>>>>
>>>>> I understand how it makes sense for batch jobs, but it damages stream
>>>>> jobs, using equality deletes works much better for streaming (which have a
>>>>> strict SLA for delays), and in order to decrease the performance penalty -
>>>>> systems can rewrite the equality deletes to positional deletes.
>>>>>
>>>>> Shani.
>>>>>
>>>>> On 1 Nov 2024, at 20:06, Steven Wu <stevenz...@gmail.com> wrote:
>>>>>
>>>>> 
>>>>> Fundamentally, it is very difficult to write position deletes with
>>>>> concurrent writers and conflicts for batch jobs too, as the inverted index
>>>>> may become invalid/stale.
>>>>>
>>>>> The position deletes are created during the write phase. But conflicts
>>>>> are only detected at the commit stage. I assume the batch job should fail
>>>>> in this case.
>>>>>
>>>>> On Fri, Nov 1, 2024 at 10:57 AM Steven Wu <stevenz...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Shani,
>>>>>>
>>>>>> That is a good point. It is certainly a limitation for the Flink job
>>>>>> to track the inverted index internally (which is what I had in mind). It
>>>>>> can't be shared/synchronized with other Flink jobs or other engines 
>>>>>> writing
>>>>>> to the same table.
>>>>>>
>>>>>> Thanks,
>>>>>> Steven
>>>>>>
>>>>>> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar
>>>>>> <sh...@upsolver.com.invalid> wrote:
>>>>>>
>>>>>>> Even if Flink can create this state, it would have to be maintained
>>>>>>> against the Iceberg table, we wouldn't like duplicates (keys) if other
>>>>>>> systems / users update the table (e.g manual insert / updates using 
>>>>>>> DML).
>>>>>>>
>>>>>>> Shani.
>>>>>>>
>>>>>>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com> wrote:
>>>>>>>
>>>>>>> 
>>>>>>> > Add support for inverted indexes to reduce the cost of position
>>>>>>> lookup. This is fairly tricky to implement for streaming use cases 
>>>>>>> without
>>>>>>> an external system.
>>>>>>>
>>>>>>> Anton, that is also what I was saying earlier. In Flink, the
>>>>>>> inverted index of (key, committed data files) can be tracked in Flink 
>>>>>>> state.
>>>>>>>
>>>>>>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi <
>>>>>>> aokolnyc...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I was a bit skeptical when we were adding equality deletes, but
>>>>>>>> nothing beats their performance during writes. We have to find an
>>>>>>>> alternative before deprecating.
>>>>>>>>
>>>>>>>> We are doing a lot of work to improve streaming, like reducing the
>>>>>>>> cost of commits, enabling a large (potentially infinite) number of
>>>>>>>> snapshots, changelog reads, and so on. It is a project goal to excel in
>>>>>>>> streaming.
>>>>>>>>
>>>>>>>> I was going to focus on equality deletes after completing the DV
>>>>>>>> work. I believe we have these options:
>>>>>>>>
>>>>>>>> - Revisit the existing design of equality deletes (e.g. add more
>>>>>>>> restrictions, improve compaction, offer new writers).
>>>>>>>> - Standardize on the view-based approach [1] to handle streaming
>>>>>>>> upserts and CDC use cases, potentially making this part of the spec.
>>>>>>>> - Add support for inverted indexes to reduce the cost of position
>>>>>>>> lookup. This is fairly tricky to implement for streaming use cases 
>>>>>>>> without
>>>>>>>> an external system. Our runtime filtering in Spark today is equivalent 
>>>>>>>> to
>>>>>>>> looking up positions in an inverted index represented by another 
>>>>>>>> Iceberg
>>>>>>>> table. That may still not be enough for some streaming use cases.
>>>>>>>>
>>>>>>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/
>>>>>>>>
>>>>>>>> - Anton
>>>>>>>>
>>>>>>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield <emkornfi...@gmail.com>
>>>>>>>> пише:
>>>>>>>>
>>>>>>>>> I agree that equality deletes have their place in streaming.  I
>>>>>>>>> think the ultimate decision here is how opinionated Iceberg wants to 
>>>>>>>>> be on
>>>>>>>>> its use-cases.  If it really wants to stick to its origins of "slow 
>>>>>>>>> moving
>>>>>>>>> data", then removing equality deletes would be inline with this.  I 
>>>>>>>>> think
>>>>>>>>> the other high level question is how much we allow for partially 
>>>>>>>>> compatible
>>>>>>>>> features (the row lineage use-case feature was explicitly approved
>>>>>>>>> excluding equality deletes, and people seemed OK with it at the time. 
>>>>>>>>>  If
>>>>>>>>> all features need to work together, then maybe we need to rethink the
>>>>>>>>> design here so it can be forward compatible with equality deletes).
>>>>>>>>>
>>>>>>>>> I think one issue with equality deletes as stated in the spec is
>>>>>>>>> that they are overly broad.  I'd be interested if people have any use 
>>>>>>>>> cases
>>>>>>>>> that differ, but I think one way of narrowing (and probably a 
>>>>>>>>> necessary
>>>>>>>>> building block for building something better)  the specification 
>>>>>>>>> scope on
>>>>>>>>> equality deletes is to focus on upsert/Streaming deletes.  Two 
>>>>>>>>> proposals in
>>>>>>>>> this regard are:
>>>>>>>>>
>>>>>>>>> 1.  Require that equality deletes can only correspond to unique
>>>>>>>>> identifiers for the table.
>>>>>>>>> 2.  Consider requiring that for equality deletes on partitioned
>>>>>>>>> tables, that the primary key must contain a partition column (I 
>>>>>>>>> believe
>>>>>>>>> Flink at least already does this).  It is less clear to me that this 
>>>>>>>>> would
>>>>>>>>> meet all existing use-cases.  But having this would allow for better
>>>>>>>>> incremental data-structures, which could then be partition based.
>>>>>>>>>
>>>>>>>>> Narrow scope to unique identifiers would allow for further
>>>>>>>>> building blocks already mentioned, like a secondary index (possible 
>>>>>>>>> via LSM
>>>>>>>>> tree), that would allow for better performance overall.
>>>>>>>>>
>>>>>>>>> I generally agree with the sentiment that we shouldn't deprecate
>>>>>>>>> them until there is a viable replacement.  With all due respect to my
>>>>>>>>> employer, let's not fall into the Google trap [1] :)
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Micah
>>>>>>>>>
>>>>>>>>> [1] https://goomics.net/50/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo <
>>>>>>>>> alex...@starburstdata.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hey all,
>>>>>>>>>>
>>>>>>>>>> Just to throw my 2 cents in, I agree with Steven and others that
>>>>>>>>>> we do need some kind of replacement before deprecating equality 
>>>>>>>>>> deletes.
>>>>>>>>>> They certainly have their problems, and do significantly increase
>>>>>>>>>> complexity as they are now, but the writing of position deletes is 
>>>>>>>>>> too
>>>>>>>>>> expensive for certain pipelines.
>>>>>>>>>>
>>>>>>>>>> We've been investigating using equality deletes for some of our
>>>>>>>>>> workloads at Starburst, the key advantage we were hoping to leverage 
>>>>>>>>>> is
>>>>>>>>>> cheap, effectively random access lookup deletes.
>>>>>>>>>> Say you have a UUID column that's unique in a table and want to
>>>>>>>>>> delete a row by UUID. With position deletes each delete is expensive
>>>>>>>>>> without an index on that UUID.
>>>>>>>>>> With equality deletes each delete is cheap and while
>>>>>>>>>> reads/compaction is expensive but when updates are frequent and 
>>>>>>>>>> reads are
>>>>>>>>>> sporadic that's a reasonable tradeoff.
>>>>>>>>>>
>>>>>>>>>> Pretty much what Jason and Steven have already said.
>>>>>>>>>>
>>>>>>>>>> Maybe there are some incremental improvements on equality deletes
>>>>>>>>>> or tips from similar systems that might alleviate some of their 
>>>>>>>>>> problems?
>>>>>>>>>>
>>>>>>>>>> - Alex Jo
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu <stevenz...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> We probably all agree with the downside of equality deletes: it
>>>>>>>>>>> postpones all the work on the read path.
>>>>>>>>>>>
>>>>>>>>>>> In theory, we can implement position deletes only in the Flink
>>>>>>>>>>> streaming writer. It would require the tracking of last committed 
>>>>>>>>>>> data
>>>>>>>>>>> files per key, which can be stored in Flink state (checkpointed). 
>>>>>>>>>>> This is
>>>>>>>>>>> obviously quite expensive/challenging, but possible.
>>>>>>>>>>>
>>>>>>>>>>> I like to echo one benefit of equality deletes that Russel
>>>>>>>>>>> called out in the original email. Equality deletes would never
>>>>>>>>>>> have conflicts. that is important for streaming writers (Flink, 
>>>>>>>>>>> Kafka
>>>>>>>>>>> connect, ...) that commit frequently (minutes or less). Assume 
>>>>>>>>>>> Flink can
>>>>>>>>>>> write position deletes only and commit every 2 minutes. The 
>>>>>>>>>>> long-running
>>>>>>>>>>> nature of streaming jobs can cause frequent commit conflicts with
>>>>>>>>>>> background delete compaction jobs.
>>>>>>>>>>>
>>>>>>>>>>> Overall, the streaming upsert write is not a well solved problem
>>>>>>>>>>> in Iceberg. This probably affects all streaming engines (Flink, 
>>>>>>>>>>> Kafka
>>>>>>>>>>> connect, Spark streaming, ...). We need to come up with some better
>>>>>>>>>>> alternatives before we can deprecate equality deletes.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer <
>>>>>>>>>>> russell.spit...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> For users of Equality Deletes, what are the key benefits to
>>>>>>>>>>>> Equality Deletes that you would like to preserve and could you 
>>>>>>>>>>>> please share
>>>>>>>>>>>> some concrete examples of the queries you want to run (and the 
>>>>>>>>>>>> schemas and
>>>>>>>>>>>> data sizes you would like to run them against) and the latencies 
>>>>>>>>>>>> that would
>>>>>>>>>>>> be acceptable?
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine
>>>>>>>>>>>> <ja...@upsolver.com.invalid> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Representing Upsolver here, we also make use of Equality
>>>>>>>>>>>>> Deletes to deliver high frequency low latency updates to our 
>>>>>>>>>>>>> clients at
>>>>>>>>>>>>> scale. We have customers using them at scale and demonstrating 
>>>>>>>>>>>>> the need and
>>>>>>>>>>>>> viability. We automate the process of converting them into 
>>>>>>>>>>>>> positional
>>>>>>>>>>>>> deletes (or fully applying them) for more efficient engine 
>>>>>>>>>>>>> queries in the
>>>>>>>>>>>>> background giving our users both low latency and good query 
>>>>>>>>>>>>> performance.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Equality Deletes were added since there isn't a good way to
>>>>>>>>>>>>> solve frequent updates otherwise. It would require some sort of 
>>>>>>>>>>>>> index
>>>>>>>>>>>>> keeping track of every record in the table (by a predetermined 
>>>>>>>>>>>>> PK) and
>>>>>>>>>>>>> maintaining such an index is a huge task that every tool 
>>>>>>>>>>>>> interested in this
>>>>>>>>>>>>> would need to re-implement. It also becomes a bottleneck limiting 
>>>>>>>>>>>>> table
>>>>>>>>>>>>> sizes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think they should be removed without providing an
>>>>>>>>>>>>> alternative. Positional Deletes have a different performance 
>>>>>>>>>>>>> profile
>>>>>>>>>>>>> inherently, requiring more upfront work proportional to the table 
>>>>>>>>>>>>> size.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré <
>>>>>>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Russell
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for the nice writeup and the proposal.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I agree with your analysis, and I have the same feeling.
>>>>>>>>>>>>>> However, I
>>>>>>>>>>>>>> think there are more than Flink that write equality delete
>>>>>>>>>>>>>> files. So,
>>>>>>>>>>>>>> I agree to deprecate in V3, but maybe be more "flexible"
>>>>>>>>>>>>>> about removal
>>>>>>>>>>>>>> in V4 in order to give time to engines to update.
>>>>>>>>>>>>>> I think that by deprecating equality deletes, we are clearly
>>>>>>>>>>>>>> focusing
>>>>>>>>>>>>>> on read performance and "consistency" (more than write). It's
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>> necessarily a bad thing but the streaming platform and data
>>>>>>>>>>>>>> ingestion
>>>>>>>>>>>>>> platforms will be probably concerned about that (by using
>>>>>>>>>>>>>> positional
>>>>>>>>>>>>>> deletes, they will have to scan/read all datafiles to find the
>>>>>>>>>>>>>> position, so painful).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So, to summarize:
>>>>>>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to commit any
>>>>>>>>>>>>>> target
>>>>>>>>>>>>>> for deletion before having a clear path for streaming
>>>>>>>>>>>>>> platforms
>>>>>>>>>>>>>> (Flink, Beam, ...)
>>>>>>>>>>>>>> 2. In the meantime (during the deprecation period), I propose
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> explore possible improvements for streaming platforms (maybe
>>>>>>>>>>>>>> finding a
>>>>>>>>>>>>>> way to avoid full data files scan, ...)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks !
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer
>>>>>>>>>>>>>> <russell.spit...@gmail.com> wrote:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Background:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > 1) Position Deletes
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Writers determine what rows are deleted and mark them in a
>>>>>>>>>>>>>> 1 for 1 representation. With delete vectors this means every 
>>>>>>>>>>>>>> data file has
>>>>>>>>>>>>>> at most 1 delete vector that it is read in conjunction with to 
>>>>>>>>>>>>>> excise
>>>>>>>>>>>>>> deleted rows. Reader overhead is more or less constant and is 
>>>>>>>>>>>>>> very
>>>>>>>>>>>>>> predictable.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > The main cost of this mode is that deletes must be
>>>>>>>>>>>>>> determined at write time which is expensive and can be more 
>>>>>>>>>>>>>> difficult for
>>>>>>>>>>>>>> conflict resolution
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > 2) Equality Deletes
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Writers write out reference to what values are deleted (in
>>>>>>>>>>>>>> a partition or globally). There can be an unlimited number of 
>>>>>>>>>>>>>> equality
>>>>>>>>>>>>>> deletes and they all must be checked for every data file that is 
>>>>>>>>>>>>>> read. The
>>>>>>>>>>>>>> cost of determining deleted rows is essentially given to the 
>>>>>>>>>>>>>> reader.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Conflicts almost never happen since data files are not
>>>>>>>>>>>>>> actually changed and there is almost no cost to the writer to 
>>>>>>>>>>>>>> generate
>>>>>>>>>>>>>> these. Almost all costs related to equality deletes are passed 
>>>>>>>>>>>>>> on to the
>>>>>>>>>>>>>> reader.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Proposal:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Equality deletes are, in my opinion, unsustainable and we
>>>>>>>>>>>>>> should work on deprecating and removing them from the 
>>>>>>>>>>>>>> specification. At
>>>>>>>>>>>>>> this time, I know of only one engine (Apache Flink) which 
>>>>>>>>>>>>>> produces these
>>>>>>>>>>>>>> deletes but almost all engines have implementations to read 
>>>>>>>>>>>>>> them. The cost
>>>>>>>>>>>>>> of implementing equality deletes on the read path is difficult 
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> unpredictable in terms of memory usage and compute complexity. 
>>>>>>>>>>>>>> We’ve had
>>>>>>>>>>>>>> suggestions of implementing rocksdb inorder to handle ever 
>>>>>>>>>>>>>> growing sets of
>>>>>>>>>>>>>> equality deletes which in my opinion shows that we are going 
>>>>>>>>>>>>>> down the wrong
>>>>>>>>>>>>>> path.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Outside of performance, Equality deletes are also difficult
>>>>>>>>>>>>>> to use in conjunction with many other features. For example, any 
>>>>>>>>>>>>>> features
>>>>>>>>>>>>>> requiring CDC or Row lineage are basically impossible when 
>>>>>>>>>>>>>> equality deletes
>>>>>>>>>>>>>> are in use. When Equality deletes are present, the state of the 
>>>>>>>>>>>>>> table can
>>>>>>>>>>>>>> only be determined with a full scan making it difficult to update
>>>>>>>>>>>>>> differential structures. This means materialized views or 
>>>>>>>>>>>>>> indexes need to
>>>>>>>>>>>>>> essentially be fully rebuilt whenever an equality delete is 
>>>>>>>>>>>>>> added to the
>>>>>>>>>>>>>> table.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Equality deletes essentially remove complexity from the
>>>>>>>>>>>>>> write side but then add what I believe is an unacceptable level 
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> complexity to the read side.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Because of this I suggest we deprecate Equality Deletes in
>>>>>>>>>>>>>> V3 and slate them for full removal from the Iceberg Spec in V4.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > I know this is a big change and compatibility breakage so I
>>>>>>>>>>>>>> would like to introduce this idea to the community and solicit 
>>>>>>>>>>>>>> feedback
>>>>>>>>>>>>>> from all stakeholders. I am very flexible on this issue and 
>>>>>>>>>>>>>> would like to
>>>>>>>>>>>>>> hear the best issues both for and against removal of Equality 
>>>>>>>>>>>>>> Deletes.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Thanks everyone for your time,
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Russ Spitzer
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Jason Fine*
>>>>>>>>>>>>> Chief Software Architect
>>>>>>>>>>>>> ja...@upsolver.com  | www.upsolver.com
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>

Reply via email to