Thanks Peter / Szehon, Thank you for your interest !
> What about the conversion cost? Based on your experience, what is the cost difference between a conversion and a full rewrite? When does it worth to do a delete conversion, and when does it worth to do a full data rewrite? Here is one interesting thread for the same [1] on when and what should be good, I don't have crisp numbers for this tbh, as it highly depends on how the table lay-out and how deletes are spread across, can work on them, I think mostly it comes from the fact sometime the rewrite operation itself fails with OOM when there are tons of equality deletes [2] or simple reads [3], there have been attempts to make it more reliable via using Rocks DB [4] [5] in past but i think the issue still persists. > We still would need to solve the problem for the rows updated by the "not yet converted" deletes. Do you have a proposal for this? Unless we make the equality deletes as positional i don't think we can support row_lineage at all (based on my understanding), once we materialized equality delete to positional delete then the existing proposal should work with some minor tweaks (need to check this one as well) > If/when we move forward with this conversion, it would be nice to choose an implementation which would allow the Flink table maintenance to reuse as much as possible. I agree, I have not taken a deeper look into Flink table maintenance and its requirements, but I think in general if we just make sure the action for this is solid, we should be good right ? [1] https://github.com/apache/iceberg/pull/2372#issuecomment-845514501 [2] https://github.com/apache/iceberg/issues/10054 [3] https://github.com/apache/iceberg/issues/6307 [4] https://github.com/apache/iceberg/pull/2680 [5] https://github.com/apache/iceberg/pull/10667 Regards, Prashant Singh On Fri, Sep 13, 2024 at 11:19 PM Péter Váry <peter.vary.apa...@gmail.com> wrote: > Hi Prashant, > > Interesting stuff! > > I have a few questions: > 1. I understand that it is easier to apply positional deletes than > equality deletes. What about the conversion cost? Based on your experience, > what is the cost difference between a conversion and a full rewrite? When > does it worth to do a delete conversion, and when does it worth to do a > full data rewrite? > 2. If writers still write equality deletes, how would that work with the > row lineage? We still would need to solve the problem for the rows updated > by the "not yet converted" deletes. Do you have a proposal for this? We > might want to add that to the linked document. > 3. If/when we move forward with this conversion, it would be nice to > choose an implementation which would allow the the Flink table maintenance > to reuse as much as possible. > > Thanks, > Peter > > > On Sat, Sep 14, 2024, 02:08 Szehon Ho <szehon.apa...@gmail.com> wrote: > >> +1, Id be happy to see this feature. >> >> Thanks >> Szehon >> >> On Fri, Sep 13, 2024 at 10:33 AM Prashant Singh <prashant010...@gmail.com> >> wrote: >> >>> Hi All, >>> >>> Starting this thread to revive the discussion on converting Equality >>> Deletes as Position deletes and see if this is something community wants >>> now (Happy to contribute in this) considering : >>> 1/ Now it's not just flink but other writers such Kafka-Connect [1] >>> debezium server for iceberg [2] have emerged which emit equality deletes >>> and equality deletes doesn't goes well with the readers in general. >>> 2/ We have some features like Row lineage [3] on which we have taken >>> decided we will not support it with equality deletes >>> 3/ We are doing considerable enhancements in Position deletes in v3 [4] >>> >>> I see some past work has been done in this context [5] and we have an >>> action already in our repo. >>> >>> Let me know your thoughts. >>> >>> Regards, >>> Prashant >>> >>> [1] https://github.com/tabular-io/iceberg-kafka-connect/tree/main >>> [2] https://github.com/memiiso/debezium-server-iceberg >>> [3] >>> https://docs.google.com/document/d/146YuAnU17prnIhyuvbCtCtVSavyd5N7hKryyVRaFDTE/edit#heading=h.f2e8ffw3fu7n >>> [4] >>> https://docs.google.com/document/d/18Bqhr-vnzFfQk1S4AgRISkA_5_m5m32Nnc2Cw0zn2XM/edit >>> [5] >>> https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java >>> >>