Hi all,

In recent secondary index sync meetings, the discussion converged on the
need to define what an index is from first principles before settling on
physical layout.

To address that, Peter and I have drafted a requirements document for a key
lookup index (renamed from "primary key index" to avoid implying uniqueness
enforcement), the goal is to nail down one well-scoped index type first.

Doc: Key Lookup Index Requirements
<https://docs.google.com/document/d/1e0zxK-jA0LBDq8YQlQgFipTHelDFiga8lCkgDTmYub8/edit?tab=t.0#heading=h.8shrgabvl19>

It covers requirements, three design options (manifest + sorted Parquet,
hash + sorted Parquet, hash + MPHF) and open questions. We will add
preliminary benchmark results shortly.

Feedback welcome — inline in the doc, on this thread, or at the next index
sync.

Thanks,

Huaxin

On Mon, Apr 13, 2026 at 7:22 AM Steven Wu <[email protected]> wrote:

> Do we need the special index identifier that was originally proposed? A
> generic CatalogObjectIdentifier (with namespace and name) would be
> consistent with all object types in the catalog. I have a discussion thread
> on the generic identifier topic: [DISCUSS] REST Spec: generic
> CatalogObjectIdentifier.
>
> Should we add an indexes array field to table metadata? It only contains
> a list of index object identifiers. It doesn't contain any index metadata
> which should live in the index objects. Yufei was trying to bring this up
> at the end of the first sync. But we didn't get enough time to really
> discuss it. It will be great to discuss this as the first agenda item today.
>
> On Mon, Apr 13, 2026 at 3:17 AM Péter Váry <[email protected]>
> wrote:
>
>> Hi everyone,
>>
>> We had several engaging discussions at the Iceberg Summit, and it was
>> great to finally catch up with many of you in person. We truly missed those
>> who couldn’t attend, hopefully we’ll all meet again at the next summit.
>>
>> To keep the conversation going, Huaxin and I have put together the agenda
>> for our next meeting. As a reminder, we’ll meet on *April 13th,
>> 9:00–10:00 AM *PDT (6:00–7:00 PM CEST).
>>
>> Proposed agenda:
>>
>>    - Continue first-principles index design discussion from Mar 30
>>       - *Index Ownership and Write Responsibility*
>>          - Should writers be allowed to update indexes, or
>>          - Should all index writes be handled exclusively by the Index
>>          Maintenance process?
>>          - If writers can update indexes then we need to define what
>>          guarantees are required (compaction, file splitting, layout 
>> expectations)?
>>          - If only Index Maintenance updates indexes then we only need
>>          to define what observable properties should be exposed to 
>> consumers? Like:
>>             - Expected max files for a single key
>>             - Current max files for a single key
>>             - Deletes allowed/present
>>             - Sorted by
>>             - Partitioned by
>>          - *Specification Scope: What Belongs in the Spec?*
>>          - Related to the ownership question above
>>          - Light spec: Just define that the index table should be
>>          optimized for retrieval by key columns and the index columns should 
>> be
>>          contained in the table. This could give us more flexibility if 
>> better
>>          organization methods come up, or
>>          - Detailed spec: We could define the max number of files per
>>          index to read for a single key, or even the partitioning and the 
>> exact sort
>>          order. This could allow more use-cases for a given index, like 
>> joins or
>>          cardinality estimations.
>>          - I would go for light spec for the main types (PK, Containing)
>>          and only the Index Maintenance processes should update the Indexes, 
>> as for
>>          many use-cases the details are not important, and writers will very 
>> rarely
>>          update the Indexes themselves.
>>       - *Logical Placement of Indexes*
>>          - Index as a child object of an Iceberg Table, or
>>          - Index as a first‑class entity under /namespace/indexes/{index}
>>          - Based on the discussions on the summit we are leaning in this
>>          direction. This means the index id should be unique in the 
>> namespace but
>>          helps the catalog implementations quite a bit
>>       - *Physical Placement of Index Data*
>>          - I don’t think we should specify this. We should have a base
>>          location for the index, but can rely on the catalog implementations 
>> to
>>          decide on their own, like they do with the tables, views, udfs.
>>       - *Iceberg Reader Based indexes* (Containing indexes and
>>       potentially PK indexes). These are the indexes which could be read by 
>> the
>>       existing Iceberg readers. We might decide to store the PK index 
>> similarly
>>       to an Iceberg Table and treat it as a reader based index.
>>          - What are the table properties/features exposed to the readers
>>             - Maybe just some behavioral descriptors for the optimizer
>>             to decide if the index could be used or should be skipped, like:
>>                - Expected max files for a single key
>>                - max files for a single key
>>                - Deletes allowed/present
>>                - Sorted by
>>                - Partitioned by
>>             - The Tasks when reading the index based on the filters and
>>             projection
>>          - What are the table properties/features exposed to the Index
>>          Maintenance. I think this could be internal to the Index Maintenance
>>          process and might not be exposed through the spec. The Index 
>> Maintenance
>>          process could handle this as a standard Iceberg Table and could be 
>> based on
>>          the Table Maintenance process, but there might be some totally 
>> different
>>          processes.
>>       - It should be possible to add properties to an index defined by
>>       the Index Maintenance process which could be used and updated in the 
>> next
>>       Index Maintenance run.
>>    - *PK index storage format benchmark results*
>>       - Flat Parquet (baseline)
>>       - BTree with Parquet leaves
>>       - Vortex
>>    - *Open items / next steps*
>>
>> Thanks,
>> Peter
>>
>> huaxin gao <[email protected]> ezt írta (időpont: 2026. márc. 23.,
>> H, 3:03):
>>
>>> Hi everyone, I wanted to share an update on the primary key index work.
>>> Since there are still open questions on whether bloom filter indexes fit
>>> in the secondary index framework or should be treated as extended stats,
>>> I've shifted focus to the primary key index since it's a clearer fit for
>>> the framework.
>>> I've put together a proposal for a primary key reverse-lookup index that
>>> maps each key to its physical location (file_path, row_position). It
>>> enables:
>>>
>>>    - Scan-time file pruning for point lookups
>>>    - Converting key-based deletes into position deletes (eliminating
>>>    equality deletes for Flink CDC)
>>>    - Accelerating Spark MERGE INTO by replacing full-table joins with
>>>    direct file lookups
>>>
>>> Proposal:
>>> https://docs.google.com/document/d/1HuhCZ0n2FqDh8yqQb9oEj1CPM5yXpEsMPGZno2aSf8E/edit?tab=t.0#heading=h.tbevg4q0m9
>>> Feedback welcome!
>>> Thanks,
>>> Huaxin
>>>
>>> On Wed, Mar 18, 2026 at 11:42 PM Péter Váry <[email protected]>
>>> wrote:
>>>
>>>> Key takeaways from the general index discussion at the May 16 meeting.
>>>> Thanks to everyone who participated! The recording is available here:
>>>> https://www.youtube.com/watch?v=btmjhtRWUCE
>>>>
>>>>    - Q: Do we need to tie index types to the algorithms used to access
>>>>    them?
>>>>    - A: From a specification perspective, the goal is to define the
>>>>    storage-level data layout so it can be shared across engines. Engines 
>>>> are
>>>>    free to interpret and use the data as they see fit, but the on-disk data
>>>>    layout itself must be strictly defined and interoperable.
>>>>
>>>>    - Q: Should we introduce an additional abstraction layer (e.g.,
>>>>    Vector Index) with sub-types such as IVF and DiskANN?
>>>>    - A: This is possible if we decide it is beneficial. I explored
>>>>    potential naming, but it is not yet clear how such a layer would be 
>>>> used in
>>>>    practice.
>>>>    *Question to Yingyi Bu*: could you provide examples where this
>>>>    additional layer would be useful? Should this abstraction be defined at 
>>>> the
>>>>    spec level, or is it better handled at the engine level?
>>>>    My initial idea was that users would create a generic Vector Index
>>>>    and let the engine choose the concrete implementation. However, this 
>>>> would
>>>>    limit user control and users likely need to specify the exact index
>>>>    representation, which implies they must be aware of the available
>>>>    representations.
>>>>
>>>>
>>>>
>>>>    - Q: Do we want to allow extensibility for index types?
>>>>    - A: Yes. The intent is to support a small set of well-defined
>>>>    index types while allowing experimentation with new ones. If a new index
>>>>    type proves broadly useful, a follow-up proposal can standardize it and
>>>>    incorporate it into the spec.
>>>>
>>>>
>>>>
>>>>    - Q: Do we allow multiple versions of an index for the same table
>>>>    snapshot?
>>>>    - A: Yes. Older index versions must be retained for readers that
>>>>    have already started using them, while new readers should automatically 
>>>> use
>>>>    the latest available version
>>>>
>>>>
>>>>
>>>>    - Q: Do we need to use materialized views for these indexes?
>>>>    - A: No. These indexes are primarily examples, and different types
>>>>    may require different storage methods. However, the Primary Key,
>>>>    Containing, and parts of the IVF indexes can be structured as Iceberg
>>>>    tables. This allows engines to read them natively; in some cases, 
>>>> Iceberg
>>>>    planners can automatically redirect queries to the index table without
>>>>    engine modifications. Furthermore, index maintenance for these tables 
>>>> can
>>>>    leverage existing materialized view maintenance workflows. Other index
>>>>    types may instead rely on Puffin files or alternative storage 
>>>> approaches.
>>>>
>>>>
>>>>
>>>>    - Q: How should index metadata be accessed? Should we add explicit
>>>>    pointers for the indexes in the table metadata?
>>>>    - A: We did not have sufficient time to fully explore and conclude
>>>>    this topic.
>>>>    *Question for Yufei Gu*: Did I understand correctly that your main
>>>>    concern stems from endpoint resolution from a REST Catalog perspective?
>>>>    Specifically, if indexes are exposed under a URI such as
>>>>    v1/{prefix}/namespaces/{namespace}/tables/{table}/indexes/{index}, would
>>>>    this make it more difficult for the REST Catalog to resolve and route
>>>>    requests to the appropriate endpoint?
>>>>
>>>>
>>>> Suhas Jayaram Subramanya via dev <[email protected]> ezt írta
>>>> (időpont: 2026. márc. 13., P, 23:32):
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> Here's a proposal for native Vector Index support in Iceberg tables --
>>>>> https://docs.google.com/document/d/1KL4qLOwdqnhOcqTc0EjO1O16NV3M3c-gZCEINDWw4lA/edit?usp=sharing
>>>>>
>>>>> We've been working on this proposal with Peter internally at Microsoft
>>>>> and he suggested we post it here to bring this to the community's
>>>>> attention, ahead of the next Secondary Index Sync.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Suhas
>>>>>
>>>>> On 2026/02/19 04:34:34 huaxin gao wrote:
>>>>> > Hi Everyone,
>>>>> >
>>>>> > Here are the recording and notes from the Iceberg Index Support Sync
>>>>> on
>>>>> > 2/11.
>>>>> >
>>>>> > Recording: https://www.youtube.com/watch?v=3sFfQ0A50yk
>>>>> >
>>>>> > Notes:
>>>>> >
>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.8041k7j2n7y3
>>>>> >
>>>>> > The meeting will move to biweekly, Mondays 9–10am PST, starting
>>>>> March 2.
>>>>> >
>>>>> > Since the sync, I updated the Bloom skipping index proposal
>>>>> > <
>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.5r5kl6k3fqwu
>>>>> >
>>>>> > to address the discussion questions, specifically:
>>>>> >
>>>>> >
>>>>> > - Performance justification: when this helps (high-cardinality = /
>>>>> IN,
>>>>> > many data files, high object-store latency) and how it differs from
>>>>> Parquet
>>>>> > row-group Bloom filters (which still require opening the data file).
>>>>> > - Cost / scalability: rough sizing (Bloom blob size per file, Puffin
>>>>> > file size), the planning cost trade-off (driver index reads vs
>>>>> executor
>>>>> > file opens), and mitigations via caching.
>>>>> > - Lifecycle / maintenance: incremental production as new data files
>>>>> > arrive, behavior when the index is missing/behind, and
>>>>> sharding/compaction
>>>>> > plus cleanup to avoid accumulating too many small Puffin files over
>>>>> time.
>>>>> > - Writer expectations: inline (optional) vs asynchronous (primary)
>>>>> index
>>>>> > creation.
>>>>> >
>>>>> > I also implemented a Spark 4.1 POC
>>>>> > <https://github.com/apache/iceberg/pull/15311> and a local
>>>>> benchmark to
>>>>> > quantify both the pruning impact (plannedFiles → afterBloom) and the
>>>>> index
>>>>> > read overhead (statsFiles, statsBytes, bloomPayloadBytes) for point
>>>>> > predicates on high-cardinality columns. Please take a look and let
>>>>> me know
>>>>> > if you have any questions or feedback.
>>>>> >
>>>>> > Thanks,
>>>>> >
>>>>> > Huaxin
>>>>> >
>>>>> > On Tue, Feb 10, 2026 at 1:43 PM huaxin gao <[email protected]> wrote:
>>>>> >
>>>>> > > Reminder for tomorrow's sync on Iceberg Index Support.
>>>>> > >
>>>>> > > Wednesday: Feb. 11 9:00 – 10:00am
>>>>> > > Time zone: America/Los_Angeles
>>>>> > > Google Meet joining info
>>>>> > > Video call link: meet.google.com/nsp-ctyr-khk
>>>>> > > Design doc:
>>>>> > >
>>>>> > >
>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2
>>>>> > >
>>>>> > >
>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7
>>>>> > >
>>>>> > > Thanks,
>>>>> > > Huaxin
>>>>> > >
>>>>> > >
>>>>> > > On Tue, Feb 3, 2026 at 10:52 PM Péter Váry <[email protected]>
>>>>> > > wrote:
>>>>> > >
>>>>> > >> Thanks Huaxin and Steven for organizing this. Looking forward to
>>>>> meet you
>>>>> > >> all next week!
>>>>> > >>
>>>>> > >> On Wed, Feb 4, 2026, 02:48 Steven Wu <[email protected]> wrote:
>>>>> > >>
>>>>> > >>> We set up the dev calendar event with a new google meet link.
>>>>> Please
>>>>> > >>> ignore the link from Huaxin's original email.
>>>>> > >>>
>>>>> > >>> The dev calendar has the correct info (including the new meeting
>>>>> link)
>>>>> > >>>
>>>>> > >>> Iceberg Index Support Sync
>>>>> > >>> Wednesday, February 11 · 9:00 – 10:00am
>>>>> > >>> Time zone: America/Los_Angeles
>>>>> > >>> Google Meet joining info
>>>>> > >>> Video call link: https://meet.google.com/nsp-ctyr-khk
>>>>> > >>>
>>>>> > >>> On Tue, Feb 3, 2026 at 5:08 PM huaxin gao <[email protected]>
>>>>> > >>> wrote:
>>>>> > >>>
>>>>> > >>>> Sorry, I meant PST (not EST) :)
>>>>> > >>>> Looking forward to the discussion!
>>>>> > >>>>
>>>>> > >>>> On Tue, Feb 3, 2026 at 4:58 PM Shawn Chang <[email protected]>
>>>>> > >>>> wrote:
>>>>> > >>>>
>>>>> > >>>>> Hi Huaxin,
>>>>> > >>>>>
>>>>> > >>>>> Thanks for starting the sync!
>>>>> > >>>>>
>>>>> > >>>>> The meeting seems to be 9-10AM PST on the dev events calendar
>>>>> > >>>>> <
>>>>> https://calendar.google.com/calendar/u/0?cid=MzkwNWQ0OTJmMWI0NTBiYTA3MTJmMmFlNmFmYTc2ZWI3NTdmMTNkODUyMjBjYzAzYWE0NTI3ODg1YWRjNTYyOUBncm91cC5jYWxlbmRhci5nb29nbGUuY29t
>>>>> >,
>>>>> > >>>>> not EST. Maybe it's a typo?
>>>>> > >>>>> Otherwise, looking forward to the discussion!
>>>>> > >>>>>
>>>>> > >>>>> Best,
>>>>> > >>>>> Shawn
>>>>> > >>>>>
>>>>> > >>>>> On Tue, Feb 3, 2026 at 9:18 AM huaxin gao <[email protected]>
>>>>> > >>>>> wrote:
>>>>> > >>>>>
>>>>> > >>>>>> Hi all,
>>>>> > >>>>>> I'd like to start a dedicated sync to discuss Iceberg Index
>>>>> support.
>>>>> > >>>>>> Here is the existing discussion thread:
>>>>> > >>>>>>
>>>>> https://lists.apache.org/thread/fzqk3jjf0xpj5m4cfqb3v4c65p0t04ty.
>>>>> > >>>>>>
>>>>> > >>>>>> To ground the discussion, here are the two proposals:
>>>>> > >>>>>>
>>>>> > >>>>>> - Peter's proposal
>>>>> > >>>>>> <
>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2>
>>>>> (overall
>>>>> > >>>>>> index support)
>>>>> > >>>>>> - My proposal
>>>>> > >>>>>> <
>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7
>>>>> >
>>>>> > >>>>>> (bloom filter skipping index)
>>>>> > >>>>>>
>>>>> > >>>>>> Time slot: Every 3 weeks, Wednesdays at 9 AM to 10 AM EST,
>>>>> starting
>>>>> > >>>>>> next Wednesday (2/11). After FileFormat sync finishes, we
>>>>> plan to use that
>>>>> > >>>>>> slot and switch to every other Monday, 9 AM to 10 AM EST.
>>>>> > >>>>>>
>>>>> > >>>>>> Meet link: https://meet.google.com/fjn-tyze-mko
>>>>> > >>>>>>
>>>>> > >>>>>> Thanks,
>>>>> > >>>>>> Huaxin
>>>>> > >>>>>>
>>>>> > >>>>>
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>

Reply via email to