Hi all, In recent secondary index sync meetings, the discussion converged on the need to define what an index is from first principles before settling on physical layout.
To address that, Peter and I have drafted a requirements document for a key lookup index (renamed from "primary key index" to avoid implying uniqueness enforcement), the goal is to nail down one well-scoped index type first. Doc: Key Lookup Index Requirements <https://docs.google.com/document/d/1e0zxK-jA0LBDq8YQlQgFipTHelDFiga8lCkgDTmYub8/edit?tab=t.0#heading=h.8shrgabvl19> It covers requirements, three design options (manifest + sorted Parquet, hash + sorted Parquet, hash + MPHF) and open questions. We will add preliminary benchmark results shortly. Feedback welcome — inline in the doc, on this thread, or at the next index sync. Thanks, Huaxin On Mon, Apr 13, 2026 at 7:22 AM Steven Wu <[email protected]> wrote: > Do we need the special index identifier that was originally proposed? A > generic CatalogObjectIdentifier (with namespace and name) would be > consistent with all object types in the catalog. I have a discussion thread > on the generic identifier topic: [DISCUSS] REST Spec: generic > CatalogObjectIdentifier. > > Should we add an indexes array field to table metadata? It only contains > a list of index object identifiers. It doesn't contain any index metadata > which should live in the index objects. Yufei was trying to bring this up > at the end of the first sync. But we didn't get enough time to really > discuss it. It will be great to discuss this as the first agenda item today. > > On Mon, Apr 13, 2026 at 3:17 AM Péter Váry <[email protected]> > wrote: > >> Hi everyone, >> >> We had several engaging discussions at the Iceberg Summit, and it was >> great to finally catch up with many of you in person. We truly missed those >> who couldn’t attend, hopefully we’ll all meet again at the next summit. >> >> To keep the conversation going, Huaxin and I have put together the agenda >> for our next meeting. As a reminder, we’ll meet on *April 13th, >> 9:00–10:00 AM *PDT (6:00–7:00 PM CEST). >> >> Proposed agenda: >> >> - Continue first-principles index design discussion from Mar 30 >> - *Index Ownership and Write Responsibility* >> - Should writers be allowed to update indexes, or >> - Should all index writes be handled exclusively by the Index >> Maintenance process? >> - If writers can update indexes then we need to define what >> guarantees are required (compaction, file splitting, layout >> expectations)? >> - If only Index Maintenance updates indexes then we only need >> to define what observable properties should be exposed to >> consumers? Like: >> - Expected max files for a single key >> - Current max files for a single key >> - Deletes allowed/present >> - Sorted by >> - Partitioned by >> - *Specification Scope: What Belongs in the Spec?* >> - Related to the ownership question above >> - Light spec: Just define that the index table should be >> optimized for retrieval by key columns and the index columns should >> be >> contained in the table. This could give us more flexibility if >> better >> organization methods come up, or >> - Detailed spec: We could define the max number of files per >> index to read for a single key, or even the partitioning and the >> exact sort >> order. This could allow more use-cases for a given index, like >> joins or >> cardinality estimations. >> - I would go for light spec for the main types (PK, Containing) >> and only the Index Maintenance processes should update the Indexes, >> as for >> many use-cases the details are not important, and writers will very >> rarely >> update the Indexes themselves. >> - *Logical Placement of Indexes* >> - Index as a child object of an Iceberg Table, or >> - Index as a first‑class entity under /namespace/indexes/{index} >> - Based on the discussions on the summit we are leaning in this >> direction. This means the index id should be unique in the >> namespace but >> helps the catalog implementations quite a bit >> - *Physical Placement of Index Data* >> - I don’t think we should specify this. We should have a base >> location for the index, but can rely on the catalog implementations >> to >> decide on their own, like they do with the tables, views, udfs. >> - *Iceberg Reader Based indexes* (Containing indexes and >> potentially PK indexes). These are the indexes which could be read by >> the >> existing Iceberg readers. We might decide to store the PK index >> similarly >> to an Iceberg Table and treat it as a reader based index. >> - What are the table properties/features exposed to the readers >> - Maybe just some behavioral descriptors for the optimizer >> to decide if the index could be used or should be skipped, like: >> - Expected max files for a single key >> - max files for a single key >> - Deletes allowed/present >> - Sorted by >> - Partitioned by >> - The Tasks when reading the index based on the filters and >> projection >> - What are the table properties/features exposed to the Index >> Maintenance. I think this could be internal to the Index Maintenance >> process and might not be exposed through the spec. The Index >> Maintenance >> process could handle this as a standard Iceberg Table and could be >> based on >> the Table Maintenance process, but there might be some totally >> different >> processes. >> - It should be possible to add properties to an index defined by >> the Index Maintenance process which could be used and updated in the >> next >> Index Maintenance run. >> - *PK index storage format benchmark results* >> - Flat Parquet (baseline) >> - BTree with Parquet leaves >> - Vortex >> - *Open items / next steps* >> >> Thanks, >> Peter >> >> huaxin gao <[email protected]> ezt írta (időpont: 2026. márc. 23., >> H, 3:03): >> >>> Hi everyone, I wanted to share an update on the primary key index work. >>> Since there are still open questions on whether bloom filter indexes fit >>> in the secondary index framework or should be treated as extended stats, >>> I've shifted focus to the primary key index since it's a clearer fit for >>> the framework. >>> I've put together a proposal for a primary key reverse-lookup index that >>> maps each key to its physical location (file_path, row_position). It >>> enables: >>> >>> - Scan-time file pruning for point lookups >>> - Converting key-based deletes into position deletes (eliminating >>> equality deletes for Flink CDC) >>> - Accelerating Spark MERGE INTO by replacing full-table joins with >>> direct file lookups >>> >>> Proposal: >>> https://docs.google.com/document/d/1HuhCZ0n2FqDh8yqQb9oEj1CPM5yXpEsMPGZno2aSf8E/edit?tab=t.0#heading=h.tbevg4q0m9 >>> Feedback welcome! >>> Thanks, >>> Huaxin >>> >>> On Wed, Mar 18, 2026 at 11:42 PM Péter Váry <[email protected]> >>> wrote: >>> >>>> Key takeaways from the general index discussion at the May 16 meeting. >>>> Thanks to everyone who participated! The recording is available here: >>>> https://www.youtube.com/watch?v=btmjhtRWUCE >>>> >>>> - Q: Do we need to tie index types to the algorithms used to access >>>> them? >>>> - A: From a specification perspective, the goal is to define the >>>> storage-level data layout so it can be shared across engines. Engines >>>> are >>>> free to interpret and use the data as they see fit, but the on-disk data >>>> layout itself must be strictly defined and interoperable. >>>> >>>> - Q: Should we introduce an additional abstraction layer (e.g., >>>> Vector Index) with sub-types such as IVF and DiskANN? >>>> - A: This is possible if we decide it is beneficial. I explored >>>> potential naming, but it is not yet clear how such a layer would be >>>> used in >>>> practice. >>>> *Question to Yingyi Bu*: could you provide examples where this >>>> additional layer would be useful? Should this abstraction be defined at >>>> the >>>> spec level, or is it better handled at the engine level? >>>> My initial idea was that users would create a generic Vector Index >>>> and let the engine choose the concrete implementation. However, this >>>> would >>>> limit user control and users likely need to specify the exact index >>>> representation, which implies they must be aware of the available >>>> representations. >>>> >>>> >>>> >>>> - Q: Do we want to allow extensibility for index types? >>>> - A: Yes. The intent is to support a small set of well-defined >>>> index types while allowing experimentation with new ones. If a new index >>>> type proves broadly useful, a follow-up proposal can standardize it and >>>> incorporate it into the spec. >>>> >>>> >>>> >>>> - Q: Do we allow multiple versions of an index for the same table >>>> snapshot? >>>> - A: Yes. Older index versions must be retained for readers that >>>> have already started using them, while new readers should automatically >>>> use >>>> the latest available version >>>> >>>> >>>> >>>> - Q: Do we need to use materialized views for these indexes? >>>> - A: No. These indexes are primarily examples, and different types >>>> may require different storage methods. However, the Primary Key, >>>> Containing, and parts of the IVF indexes can be structured as Iceberg >>>> tables. This allows engines to read them natively; in some cases, >>>> Iceberg >>>> planners can automatically redirect queries to the index table without >>>> engine modifications. Furthermore, index maintenance for these tables >>>> can >>>> leverage existing materialized view maintenance workflows. Other index >>>> types may instead rely on Puffin files or alternative storage >>>> approaches. >>>> >>>> >>>> >>>> - Q: How should index metadata be accessed? Should we add explicit >>>> pointers for the indexes in the table metadata? >>>> - A: We did not have sufficient time to fully explore and conclude >>>> this topic. >>>> *Question for Yufei Gu*: Did I understand correctly that your main >>>> concern stems from endpoint resolution from a REST Catalog perspective? >>>> Specifically, if indexes are exposed under a URI such as >>>> v1/{prefix}/namespaces/{namespace}/tables/{table}/indexes/{index}, would >>>> this make it more difficult for the REST Catalog to resolve and route >>>> requests to the appropriate endpoint? >>>> >>>> >>>> Suhas Jayaram Subramanya via dev <[email protected]> ezt írta >>>> (időpont: 2026. márc. 13., P, 23:32): >>>> >>>>> Hi everyone, >>>>> >>>>> Here's a proposal for native Vector Index support in Iceberg tables -- >>>>> https://docs.google.com/document/d/1KL4qLOwdqnhOcqTc0EjO1O16NV3M3c-gZCEINDWw4lA/edit?usp=sharing >>>>> >>>>> We've been working on this proposal with Peter internally at Microsoft >>>>> and he suggested we post it here to bring this to the community's >>>>> attention, ahead of the next Secondary Index Sync. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Suhas >>>>> >>>>> On 2026/02/19 04:34:34 huaxin gao wrote: >>>>> > Hi Everyone, >>>>> > >>>>> > Here are the recording and notes from the Iceberg Index Support Sync >>>>> on >>>>> > 2/11. >>>>> > >>>>> > Recording: https://www.youtube.com/watch?v=3sFfQ0A50yk >>>>> > >>>>> > Notes: >>>>> > >>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.8041k7j2n7y3 >>>>> > >>>>> > The meeting will move to biweekly, Mondays 9–10am PST, starting >>>>> March 2. >>>>> > >>>>> > Since the sync, I updated the Bloom skipping index proposal >>>>> > < >>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.5r5kl6k3fqwu >>>>> > >>>>> > to address the discussion questions, specifically: >>>>> > >>>>> > >>>>> > - Performance justification: when this helps (high-cardinality = / >>>>> IN, >>>>> > many data files, high object-store latency) and how it differs from >>>>> Parquet >>>>> > row-group Bloom filters (which still require opening the data file). >>>>> > - Cost / scalability: rough sizing (Bloom blob size per file, Puffin >>>>> > file size), the planning cost trade-off (driver index reads vs >>>>> executor >>>>> > file opens), and mitigations via caching. >>>>> > - Lifecycle / maintenance: incremental production as new data files >>>>> > arrive, behavior when the index is missing/behind, and >>>>> sharding/compaction >>>>> > plus cleanup to avoid accumulating too many small Puffin files over >>>>> time. >>>>> > - Writer expectations: inline (optional) vs asynchronous (primary) >>>>> index >>>>> > creation. >>>>> > >>>>> > I also implemented a Spark 4.1 POC >>>>> > <https://github.com/apache/iceberg/pull/15311> and a local >>>>> benchmark to >>>>> > quantify both the pruning impact (plannedFiles → afterBloom) and the >>>>> index >>>>> > read overhead (statsFiles, statsBytes, bloomPayloadBytes) for point >>>>> > predicates on high-cardinality columns. Please take a look and let >>>>> me know >>>>> > if you have any questions or feedback. >>>>> > >>>>> > Thanks, >>>>> > >>>>> > Huaxin >>>>> > >>>>> > On Tue, Feb 10, 2026 at 1:43 PM huaxin gao <[email protected]> wrote: >>>>> > >>>>> > > Reminder for tomorrow's sync on Iceberg Index Support. >>>>> > > >>>>> > > Wednesday: Feb. 11 9:00 – 10:00am >>>>> > > Time zone: America/Los_Angeles >>>>> > > Google Meet joining info >>>>> > > Video call link: meet.google.com/nsp-ctyr-khk >>>>> > > Design doc: >>>>> > > >>>>> > > >>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2 >>>>> > > >>>>> > > >>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7 >>>>> > > >>>>> > > Thanks, >>>>> > > Huaxin >>>>> > > >>>>> > > >>>>> > > On Tue, Feb 3, 2026 at 10:52 PM Péter Váry <[email protected]> >>>>> > > wrote: >>>>> > > >>>>> > >> Thanks Huaxin and Steven for organizing this. Looking forward to >>>>> meet you >>>>> > >> all next week! >>>>> > >> >>>>> > >> On Wed, Feb 4, 2026, 02:48 Steven Wu <[email protected]> wrote: >>>>> > >> >>>>> > >>> We set up the dev calendar event with a new google meet link. >>>>> Please >>>>> > >>> ignore the link from Huaxin's original email. >>>>> > >>> >>>>> > >>> The dev calendar has the correct info (including the new meeting >>>>> link) >>>>> > >>> >>>>> > >>> Iceberg Index Support Sync >>>>> > >>> Wednesday, February 11 · 9:00 – 10:00am >>>>> > >>> Time zone: America/Los_Angeles >>>>> > >>> Google Meet joining info >>>>> > >>> Video call link: https://meet.google.com/nsp-ctyr-khk >>>>> > >>> >>>>> > >>> On Tue, Feb 3, 2026 at 5:08 PM huaxin gao <[email protected]> >>>>> > >>> wrote: >>>>> > >>> >>>>> > >>>> Sorry, I meant PST (not EST) :) >>>>> > >>>> Looking forward to the discussion! >>>>> > >>>> >>>>> > >>>> On Tue, Feb 3, 2026 at 4:58 PM Shawn Chang <[email protected]> >>>>> > >>>> wrote: >>>>> > >>>> >>>>> > >>>>> Hi Huaxin, >>>>> > >>>>> >>>>> > >>>>> Thanks for starting the sync! >>>>> > >>>>> >>>>> > >>>>> The meeting seems to be 9-10AM PST on the dev events calendar >>>>> > >>>>> < >>>>> https://calendar.google.com/calendar/u/0?cid=MzkwNWQ0OTJmMWI0NTBiYTA3MTJmMmFlNmFmYTc2ZWI3NTdmMTNkODUyMjBjYzAzYWE0NTI3ODg1YWRjNTYyOUBncm91cC5jYWxlbmRhci5nb29nbGUuY29t >>>>> >, >>>>> > >>>>> not EST. Maybe it's a typo? >>>>> > >>>>> Otherwise, looking forward to the discussion! >>>>> > >>>>> >>>>> > >>>>> Best, >>>>> > >>>>> Shawn >>>>> > >>>>> >>>>> > >>>>> On Tue, Feb 3, 2026 at 9:18 AM huaxin gao <[email protected]> >>>>> > >>>>> wrote: >>>>> > >>>>> >>>>> > >>>>>> Hi all, >>>>> > >>>>>> I'd like to start a dedicated sync to discuss Iceberg Index >>>>> support. >>>>> > >>>>>> Here is the existing discussion thread: >>>>> > >>>>>> >>>>> https://lists.apache.org/thread/fzqk3jjf0xpj5m4cfqb3v4c65p0t04ty. >>>>> > >>>>>> >>>>> > >>>>>> To ground the discussion, here are the two proposals: >>>>> > >>>>>> >>>>> > >>>>>> - Peter's proposal >>>>> > >>>>>> < >>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2> >>>>> (overall >>>>> > >>>>>> index support) >>>>> > >>>>>> - My proposal >>>>> > >>>>>> < >>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7 >>>>> > >>>>> > >>>>>> (bloom filter skipping index) >>>>> > >>>>>> >>>>> > >>>>>> Time slot: Every 3 weeks, Wednesdays at 9 AM to 10 AM EST, >>>>> starting >>>>> > >>>>>> next Wednesday (2/11). After FileFormat sync finishes, we >>>>> plan to use that >>>>> > >>>>>> slot and switch to every other Monday, 9 AM to 10 AM EST. >>>>> > >>>>>> >>>>> > >>>>>> Meet link: https://meet.google.com/fjn-tyze-mko >>>>> > >>>>>> >>>>> > >>>>>> Thanks, >>>>> > >>>>>> Huaxin >>>>> > >>>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> >>>>> >>>>
