Hi everyone, Based on the conversation in the last community sync and the Iceberg Slack channel, it seems like multiple parties have interest in continuing the effort related to the secondary index in Iceberg, so I would like to restart the thread to continue the discussion.
So far most people refer to the document authored by Miao Wang <https://docs.google.com/document/d/1E1ofBQoKRnX04bWT3utgyHQGaHZoelgXosk_UNsTUuQ/edit> which has a lot of useful information about the design and implementation. However, the document is also quite old (over a year now) and a lot has changed in Iceberg since then. I think the document leaves the following open topics that we need to continue to address: 1. *scope of native index support*: what type of index should Iceberg support natively, how should developers allocate effort between adding support of Iceberg native index compared to developing Iceberg support for holistic indexing projects such as HyperSpace <https://microsoft.github.io/hyperspace/>. 2. *index levels*: we have talked about partition level indexing and file level indexing. More clarity is needed for these index levels and the level of interest and support needed for those different indexing levels. 3. *index storage*: we had unsettled debates around making index separated files or embedding it as a part of existing Iceberg file structure. We need to come up with certain criteria such as index size, easiness to generate during write, etc. to settle the discussion. 4. *Indexing process*: as stated in Miao's document, indexes could be created during the data writing process synchronously, or built asynchronously through an index service. Discussion is needed for the focus of the Iceberg index functionalities. 5. *index invalidation*: depends on the scope and level, certain indexes need to be invalidated during operations like RewriteFiles. Clarity is needed in this domain, including if we need another sequence number to track such invalidation. I suggest we iterate a bit on this list of open questions, and then we can have a meeting to discuss those aspects, and produce an updated document addressing those aspects to provide a clear path forward for developers interested in adding features in this domain. Any thoughts? Best, Jack Ye