Thank you for the proposal Xiaoxuan! I think I agree with Zheng and Steven's point that it'll probably be more helpful to start out with more specific "what" and "why" (known areas of improvement for Iceberg and driven by any use cases) before we get too deep into the "how".
In my mind, the specific known area of improvement for Iceberg related to this proposal is improving streaming upsert behavior. One area this improvement is beneficial for is being able to provide better data freshness for Iceberg CDC mirror tables without the heavy read + maintenance cost that currently exist with Flink upserts. As you mentioned, equality deletes have the benefit of being very cheap to write but can come at a high and unpredictable cost at read time. Challenges with equality deletes have been discussed in the past [1]. I'll also add that if one of the goals is to improving streaming upserts (e.g. for applying CDC change streams into Iceberg mirror tables), then there are alternatives that I think we should compare against to make the tradeoffs clear. These alternatives include leveraging the known changelog view or merge patterns [2] or improving the existing maintenance procedures. I think the potential for being able to use a inverted index for upsert cases to more directly identify positions in a file to directly write DVs is very exciting, but before getting too far into the weeds, I think it'd first be helpful to make sure we agree on the specific problem we're trying to solve when we talk about performance improvements along with any use cases, followed by comparison with known alternatives (ideally we can get numbers that demonstrate the read/write/storage/cost tradeoffs for the proposed inverted index). [1]https://lists.apache.org/thread/z0gvco6hn2bpgngvk4h6xqrnw8b32sw6 [2]https://www.tabular.io/blog/hello-world-of-cdc/ Thanks, Amogh Jahagirdar