Hi All, TL;DR We still need to validate with ADLS and S3, but based on the local tests, the MPHF approach looks more promising if we can tolerate larger files and longer index maintenance times.
Details: Here are the results from the local experiments on my Mac. I removed unnecessary statistics from the Parquet files and tested different row group sizes: - For an index file with 1M records, a row group size of 5,000 appears to be the sweet spot. - For 10M records, 10,000 rows per row group works best. If you have additional ideas for optimizing Parquet-based indexes, I’d be very interested to hear them. The test code is available on this branch: https://github.com/pvary/iceberg/tree/leaf_bench Best results: *1m records/file* - Parquet - 5000 row/RowGroup - Read: 1191 µs - 1 file open, 3 seek, 123KB read per lookup - Write: 1.7 s, 15 MB - MPHF - Read: 202 µs - 1 file open, 1 seek, 282KB read per lookup - Write: 0.8 s, 34 MB *10m records/file* - Parquet - 10000 row/RowGroup - Read: 4168 µs - 1 file open, 3 seek, 395KB read per lookup - Write: 19.5s s, 144 MB - MPHF - Read: 1086 µs - 1 file open, 1 seek, 2.8 MB (2812KB) read per lookup - Write: 6.5 s, 34 MB, 353 MB Below are the full results. *Benchmark (indexType) (keyType) (numRows) Mode Cnt Score Error UnitsInvertedIndexBenchmark.lookup PARQUET_1000 LONG 1000000 ss 10000 3285.284 ± 5.138 us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_1000 LONG 1000000 ss 10000 2522168989.000 #InvertedIndexBenchmark.lookup:openStreams PARQUET_1000 LONG 1000000 ss 10000 10000.000 #InvertedIndexBenchmark.lookup:seeks PARQUET_1000 LONG 1000000 ss 10000 30000.000 #InvertedIndexBenchmark.lookup PARQUET_1000 LONG 10000000 ss 10000 35449.614 ± 34.673 us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_1000 LONG 10000000 ss 10000 24302649201.000 #InvertedIndexBenchmark.lookup:openStreams PARQUET_1000 LONG 10000000 ss 10000 10000.000 #InvertedIndexBenchmark.lookup:seeks PARQUET_1000 LONG 10000000 ss 10000 30000.000 #InvertedIndexBenchmark.lookup PARQUET_5000 LONG 1000000 ss 10000 1191.959 ± 4.169 us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_5000 LONG 1000000 ss 10000 1230877229.000 #InvertedIndexBenchmark.lookup:openStreams PARQUET_5000 LONG 1000000 ss 10000 10000.000 #InvertedIndexBenchmark.lookup:seeks PARQUET_5000 LONG 1000000 ss 10000 30000.000 #InvertedIndexBenchmark.lookup PARQUET_5000 LONG 10000000 ss 10000 7236.447 ± 10.374 us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_5000 LONG 10000000 ss 10000 5650715973.000 #InvertedIndexBenchmark.lookup:openStreams PARQUET_5000 LONG 10000000 ss 10000 10000.000 #InvertedIndexBenchmark.lookup:seeks PARQUET_5000 LONG 10000000 ss 10000 30000.000 #InvertedIndexBenchmark.lookup PARQUET_10000 LONG 1000000 ss 10000 1349.946 ± 7.834 us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_10000 LONG 1000000 ss 10000 1730219377.000 #InvertedIndexBenchmark.lookup:openStreams PARQUET_10000 LONG 1000000 ss 10000 10000.000 #InvertedIndexBenchmark.lookup:seeks PARQUET_10000 LONG 1000000 ss 10000 30000.000 #InvertedIndexBenchmark.lookup PARQUET_10000 LONG 10000000 ss 10000 4168.635 ± 11.051 us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_10000 LONG 10000000 ss 10000 3946341532.000 #InvertedIndexBenchmark.lookup:openStreams PARQUET_10000 LONG 10000000 ss 10000 10000.000 #InvertedIndexBenchmark.lookup:seeks PARQUET_10000 LONG 10000000 ss 10000 30000.000 #InvertedIndexBenchmark.lookup PARQUET_50000 LONG 1000000 ss 10000 4736.466 ± 38.179 us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_50000 LONG 1000000 ss 10000 7427413541.000 #InvertedIndexBenchmark.lookup:openStreams PARQUET_50000 LONG 1000000 ss 10000 10000.000 #InvertedIndexBenchmark.lookup:seeks PARQUET_50000 LONG 1000000 ss 10000 30000.000 #InvertedIndexBenchmark.lookup PARQUET_50000 LONG 10000000 ss 10000 4979.031 ± 34.708 us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_50000 LONG 10000000 ss 10000 7694887636.000 #InvertedIndexBenchmark.lookup:openStreams PARQUET_50000 LONG 10000000 ss 10000 10000.000 #InvertedIndexBenchmark.lookup:seeks PARQUET_50000 LONG 10000000 ss 10000 30000.000 #InvertedIndexBenchmark.lookup MPHF LONG 1000000 ss 10000 202.571 ± 2.336 us/opInvertedIndexBenchmark.lookup:bytesRead MPHF LONG 1000000 ss 10000 2821570000.000 #InvertedIndexBenchmark.lookup:openStreams MPHF LONG 1000000 ss 10000 10000.000 #InvertedIndexBenchmark.lookup:seeks MPHF LONG 1000000 ss 10000 10000.000 #InvertedIndexBenchmark.lookup MPHF LONG 10000000 ss 10000 1086.957 ± 4.524 us/opInvertedIndexBenchmark.lookup:bytesRead MPHF LONG 10000000 ss 10000 28119460000.000 #InvertedIndexBenchmark.lookup:openStreams MPHF LONG 10000000 ss 10000 10000.000 #InvertedIndexBenchmark.lookup:seeks MPHF LONG 10000000 ss 10000 10000.000 #InvertedIndexBenchmark.write PARQUET_1000 LONG 1000000 ss 3 1720731.014 ± 876636.004 us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_1000 LONG 1000000 ss 3 46453317.000 #InvertedIndexBenchmark.write PARQUET_1000 LONG 10000000 ss 3 18547947.876 ± 12258125.307 us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_1000 LONG 10000000 ss 3 452655675.000 #InvertedIndexBenchmark.write PARQUET_5000 LONG 1000000 ss 3 1718345.583 ± 1103928.016 us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_5000 LONG 1000000 ss 3 44845788.000 #InvertedIndexBenchmark.write PARQUET_5000 LONG 10000000 ss 3 18604229.931 ± 2668361.915 us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_5000 LONG 10000000 ss 3 435388818.000 #InvertedIndexBenchmark.write PARQUET_10000 LONG 1000000 ss 3 1761555.389 ± 535857.675 us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_10000 LONG 1000000 ss 3 44536635.000 #InvertedIndexBenchmark.write PARQUET_10000 LONG 10000000 ss 3 19501588.264 ± 2130054.558 us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_10000 LONG 10000000 ss 3 433189623.000 #InvertedIndexBenchmark.write PARQUET_50000 LONG 1000000 ss 3 1936624.889 ± 6601363.985 us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_50000 LONG 1000000 ss 3 44264655.000 #InvertedIndexBenchmark.write PARQUET_50000 LONG 10000000 ss 3 20471742.278 ± 10705206.310 us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_50000 LONG 10000000 ss 3 431311305.000 #InvertedIndexBenchmark.write MPHF LONG 1000000 ss 3 896573.958 ± 1408024.851 us/opInvertedIndexBenchmark.write:indexFileBytes MPHF LONG 1000000 ss 3 102846369.000 #InvertedIndexBenchmark.write MPHF LONG 10000000 ss 3 6509348.875 ± 15519975.479 us/opInvertedIndexBenchmark.write:indexFileBytes MPHF LONG 10000000 ss 3 1058435733.000 #* huaxin gao <[email protected]> ezt írta (időpont: 2026. ápr. 21., K, 20:53): > Hi all, > > In recent secondary index sync meetings, the discussion converged on the > need to define what an index is from first principles before settling on > physical layout. > > To address that, Peter and I have drafted a requirements document for a > key lookup index (renamed from "primary key index" to avoid implying > uniqueness enforcement), the goal is to nail down one well-scoped index > type first. > > Doc: Key Lookup Index Requirements > <https://docs.google.com/document/d/1e0zxK-jA0LBDq8YQlQgFipTHelDFiga8lCkgDTmYub8/edit?tab=t.0#heading=h.8shrgabvl19> > > It covers requirements, three design options (manifest + sorted Parquet, > hash + sorted Parquet, hash + MPHF) and open questions. We will add > preliminary benchmark results shortly. > > Feedback welcome — inline in the doc, on this thread, or at the next index > sync. > > Thanks, > > Huaxin > > On Mon, Apr 13, 2026 at 7:22 AM Steven Wu <[email protected]> wrote: > >> Do we need the special index identifier that was originally proposed? A >> generic CatalogObjectIdentifier (with namespace and name) would be >> consistent with all object types in the catalog. I have a discussion thread >> on the generic identifier topic: [DISCUSS] REST Spec: generic >> CatalogObjectIdentifier. >> >> Should we add an indexes array field to table metadata? It only contains >> a list of index object identifiers. It doesn't contain any index metadata >> which should live in the index objects. Yufei was trying to bring this up >> at the end of the first sync. But we didn't get enough time to really >> discuss it. It will be great to discuss this as the first agenda item today. >> >> On Mon, Apr 13, 2026 at 3:17 AM Péter Váry <[email protected]> >> wrote: >> >>> Hi everyone, >>> >>> We had several engaging discussions at the Iceberg Summit, and it was >>> great to finally catch up with many of you in person. We truly missed those >>> who couldn’t attend, hopefully we’ll all meet again at the next summit. >>> >>> To keep the conversation going, Huaxin and I have put together the >>> agenda for our next meeting. As a reminder, we’ll meet on *April 13th, >>> 9:00–10:00 AM *PDT (6:00–7:00 PM CEST). >>> >>> Proposed agenda: >>> >>> - Continue first-principles index design discussion from Mar 30 >>> - *Index Ownership and Write Responsibility* >>> - Should writers be allowed to update indexes, or >>> - Should all index writes be handled exclusively by the Index >>> Maintenance process? >>> - If writers can update indexes then we need to define what >>> guarantees are required (compaction, file splitting, layout >>> expectations)? >>> - If only Index Maintenance updates indexes then we only need >>> to define what observable properties should be exposed to >>> consumers? Like: >>> - Expected max files for a single key >>> - Current max files for a single key >>> - Deletes allowed/present >>> - Sorted by >>> - Partitioned by >>> - *Specification Scope: What Belongs in the Spec?* >>> - Related to the ownership question above >>> - Light spec: Just define that the index table should be >>> optimized for retrieval by key columns and the index columns >>> should be >>> contained in the table. This could give us more flexibility if >>> better >>> organization methods come up, or >>> - Detailed spec: We could define the max number of files per >>> index to read for a single key, or even the partitioning and the >>> exact sort >>> order. This could allow more use-cases for a given index, like >>> joins or >>> cardinality estimations. >>> - I would go for light spec for the main types (PK, >>> Containing) and only the Index Maintenance processes should update >>> the >>> Indexes, as for many use-cases the details are not important, and >>> writers >>> will very rarely update the Indexes themselves. >>> - *Logical Placement of Indexes* >>> - Index as a child object of an Iceberg Table, or >>> - Index as a first‑class entity under >>> /namespace/indexes/{index} >>> - Based on the discussions on the summit we are leaning in >>> this direction. This means the index id should be unique in the >>> namespace >>> but helps the catalog implementations quite a bit >>> - *Physical Placement of Index Data* >>> - I don’t think we should specify this. We should have a base >>> location for the index, but can rely on the catalog >>> implementations to >>> decide on their own, like they do with the tables, views, udfs. >>> - *Iceberg Reader Based indexes* (Containing indexes and >>> potentially PK indexes). These are the indexes which could be read by >>> the >>> existing Iceberg readers. We might decide to store the PK index >>> similarly >>> to an Iceberg Table and treat it as a reader based index. >>> - What are the table properties/features exposed to the readers >>> - Maybe just some behavioral descriptors for the optimizer >>> to decide if the index could be used or should be skipped, like: >>> - Expected max files for a single key >>> - max files for a single key >>> - Deletes allowed/present >>> - Sorted by >>> - Partitioned by >>> - The Tasks when reading the index based on the filters and >>> projection >>> - What are the table properties/features exposed to the Index >>> Maintenance. I think this could be internal to the Index >>> Maintenance >>> process and might not be exposed through the spec. The Index >>> Maintenance >>> process could handle this as a standard Iceberg Table and could be >>> based on >>> the Table Maintenance process, but there might be some totally >>> different >>> processes. >>> - It should be possible to add properties to an index defined by >>> the Index Maintenance process which could be used and updated in the >>> next >>> Index Maintenance run. >>> - *PK index storage format benchmark results* >>> - Flat Parquet (baseline) >>> - BTree with Parquet leaves >>> - Vortex >>> - *Open items / next steps* >>> >>> Thanks, >>> Peter >>> >>> huaxin gao <[email protected]> ezt írta (időpont: 2026. márc. 23., >>> H, 3:03): >>> >>>> Hi everyone, I wanted to share an update on the primary key index work. >>>> Since there are still open questions on whether bloom filter indexes >>>> fit in the secondary index framework or should be treated as extended >>>> stats, I've shifted focus to the primary key index since it's a clearer fit >>>> for the framework. >>>> I've put together a proposal for a primary key reverse-lookup index >>>> that maps each key to its physical location (file_path, row_position). It >>>> enables: >>>> >>>> - Scan-time file pruning for point lookups >>>> - Converting key-based deletes into position deletes (eliminating >>>> equality deletes for Flink CDC) >>>> - Accelerating Spark MERGE INTO by replacing full-table joins with >>>> direct file lookups >>>> >>>> Proposal: >>>> https://docs.google.com/document/d/1HuhCZ0n2FqDh8yqQb9oEj1CPM5yXpEsMPGZno2aSf8E/edit?tab=t.0#heading=h.tbevg4q0m9 >>>> Feedback welcome! >>>> Thanks, >>>> Huaxin >>>> >>>> On Wed, Mar 18, 2026 at 11:42 PM Péter Váry < >>>> [email protected]> wrote: >>>> >>>>> Key takeaways from the general index discussion at the May 16 meeting. >>>>> Thanks to everyone who participated! The recording is available here: >>>>> https://www.youtube.com/watch?v=btmjhtRWUCE >>>>> >>>>> - Q: Do we need to tie index types to the algorithms used to >>>>> access them? >>>>> - A: From a specification perspective, the goal is to define the >>>>> storage-level data layout so it can be shared across engines. Engines >>>>> are >>>>> free to interpret and use the data as they see fit, but the on-disk >>>>> data >>>>> layout itself must be strictly defined and interoperable. >>>>> >>>>> - Q: Should we introduce an additional abstraction layer (e.g., >>>>> Vector Index) with sub-types such as IVF and DiskANN? >>>>> - A: This is possible if we decide it is beneficial. I explored >>>>> potential naming, but it is not yet clear how such a layer would be >>>>> used in >>>>> practice. >>>>> *Question to Yingyi Bu*: could you provide examples where this >>>>> additional layer would be useful? Should this abstraction be defined >>>>> at the >>>>> spec level, or is it better handled at the engine level? >>>>> My initial idea was that users would create a generic Vector Index >>>>> and let the engine choose the concrete implementation. However, this >>>>> would >>>>> limit user control and users likely need to specify the exact index >>>>> representation, which implies they must be aware of the available >>>>> representations. >>>>> >>>>> >>>>> >>>>> - Q: Do we want to allow extensibility for index types? >>>>> - A: Yes. The intent is to support a small set of well-defined >>>>> index types while allowing experimentation with new ones. If a new >>>>> index >>>>> type proves broadly useful, a follow-up proposal can standardize it and >>>>> incorporate it into the spec. >>>>> >>>>> >>>>> >>>>> - Q: Do we allow multiple versions of an index for the same table >>>>> snapshot? >>>>> - A: Yes. Older index versions must be retained for readers that >>>>> have already started using them, while new readers should >>>>> automatically use >>>>> the latest available version >>>>> >>>>> >>>>> >>>>> - Q: Do we need to use materialized views for these indexes? >>>>> - A: No. These indexes are primarily examples, and different types >>>>> may require different storage methods. However, the Primary Key, >>>>> Containing, and parts of the IVF indexes can be structured as Iceberg >>>>> tables. This allows engines to read them natively; in some cases, >>>>> Iceberg >>>>> planners can automatically redirect queries to the index table without >>>>> engine modifications. Furthermore, index maintenance for these tables >>>>> can >>>>> leverage existing materialized view maintenance workflows. Other index >>>>> types may instead rely on Puffin files or alternative storage >>>>> approaches. >>>>> >>>>> >>>>> >>>>> - Q: How should index metadata be accessed? Should we add explicit >>>>> pointers for the indexes in the table metadata? >>>>> - A: We did not have sufficient time to fully explore and conclude >>>>> this topic. >>>>> *Question for Yufei Gu*: Did I understand correctly that your main >>>>> concern stems from endpoint resolution from a REST Catalog perspective? >>>>> Specifically, if indexes are exposed under a URI such as >>>>> v1/{prefix}/namespaces/{namespace}/tables/{table}/indexes/{index}, >>>>> would >>>>> this make it more difficult for the REST Catalog to resolve and route >>>>> requests to the appropriate endpoint? >>>>> >>>>> >>>>> Suhas Jayaram Subramanya via dev <[email protected]> ezt írta >>>>> (időpont: 2026. márc. 13., P, 23:32): >>>>> >>>>>> Hi everyone, >>>>>> >>>>>> Here's a proposal for native Vector Index support in Iceberg tables >>>>>> -- >>>>>> https://docs.google.com/document/d/1KL4qLOwdqnhOcqTc0EjO1O16NV3M3c-gZCEINDWw4lA/edit?usp=sharing >>>>>> >>>>>> We've been working on this proposal with Peter internally at >>>>>> Microsoft and he suggested we post it here to bring this to the >>>>>> community's >>>>>> attention, ahead of the next Secondary Index Sync. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Suhas >>>>>> >>>>>> On 2026/02/19 04:34:34 huaxin gao wrote: >>>>>> > Hi Everyone, >>>>>> > >>>>>> > Here are the recording and notes from the Iceberg Index Support >>>>>> Sync on >>>>>> > 2/11. >>>>>> > >>>>>> > Recording: https://www.youtube.com/watch?v=3sFfQ0A50yk >>>>>> > >>>>>> > Notes: >>>>>> > >>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.8041k7j2n7y3 >>>>>> > >>>>>> > The meeting will move to biweekly, Mondays 9–10am PST, starting >>>>>> March 2. >>>>>> > >>>>>> > Since the sync, I updated the Bloom skipping index proposal >>>>>> > < >>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.5r5kl6k3fqwu >>>>>> > >>>>>> > to address the discussion questions, specifically: >>>>>> > >>>>>> > >>>>>> > - Performance justification: when this helps (high-cardinality = / >>>>>> IN, >>>>>> > many data files, high object-store latency) and how it differs from >>>>>> Parquet >>>>>> > row-group Bloom filters (which still require opening the data file). >>>>>> > - Cost / scalability: rough sizing (Bloom blob size per file, Puffin >>>>>> > file size), the planning cost trade-off (driver index reads vs >>>>>> executor >>>>>> > file opens), and mitigations via caching. >>>>>> > - Lifecycle / maintenance: incremental production as new data files >>>>>> > arrive, behavior when the index is missing/behind, and >>>>>> sharding/compaction >>>>>> > plus cleanup to avoid accumulating too many small Puffin files over >>>>>> time. >>>>>> > - Writer expectations: inline (optional) vs asynchronous (primary) >>>>>> index >>>>>> > creation. >>>>>> > >>>>>> > I also implemented a Spark 4.1 POC >>>>>> > <https://github.com/apache/iceberg/pull/15311> and a local >>>>>> benchmark to >>>>>> > quantify both the pruning impact (plannedFiles → afterBloom) and >>>>>> the index >>>>>> > read overhead (statsFiles, statsBytes, bloomPayloadBytes) for point >>>>>> > predicates on high-cardinality columns. Please take a look and let >>>>>> me know >>>>>> > if you have any questions or feedback. >>>>>> > >>>>>> > Thanks, >>>>>> > >>>>>> > Huaxin >>>>>> > >>>>>> > On Tue, Feb 10, 2026 at 1:43 PM huaxin gao <[email protected]> wrote: >>>>>> > >>>>>> > > Reminder for tomorrow's sync on Iceberg Index Support. >>>>>> > > >>>>>> > > Wednesday: Feb. 11 9:00 – 10:00am >>>>>> > > Time zone: America/Los_Angeles >>>>>> > > Google Meet joining info >>>>>> > > Video call link: meet.google.com/nsp-ctyr-khk >>>>>> > > Design doc: >>>>>> > > >>>>>> > > >>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2 >>>>>> > > >>>>>> > > >>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7 >>>>>> > > >>>>>> > > Thanks, >>>>>> > > Huaxin >>>>>> > > >>>>>> > > >>>>>> > > On Tue, Feb 3, 2026 at 10:52 PM Péter Váry <[email protected]> >>>>>> > > wrote: >>>>>> > > >>>>>> > >> Thanks Huaxin and Steven for organizing this. Looking forward to >>>>>> meet you >>>>>> > >> all next week! >>>>>> > >> >>>>>> > >> On Wed, Feb 4, 2026, 02:48 Steven Wu <[email protected]> wrote: >>>>>> > >> >>>>>> > >>> We set up the dev calendar event with a new google meet link. >>>>>> Please >>>>>> > >>> ignore the link from Huaxin's original email. >>>>>> > >>> >>>>>> > >>> The dev calendar has the correct info (including the new >>>>>> meeting link) >>>>>> > >>> >>>>>> > >>> Iceberg Index Support Sync >>>>>> > >>> Wednesday, February 11 · 9:00 – 10:00am >>>>>> > >>> Time zone: America/Los_Angeles >>>>>> > >>> Google Meet joining info >>>>>> > >>> Video call link: https://meet.google.com/nsp-ctyr-khk >>>>>> > >>> >>>>>> > >>> On Tue, Feb 3, 2026 at 5:08 PM huaxin gao <[email protected]> >>>>>> > >>> wrote: >>>>>> > >>> >>>>>> > >>>> Sorry, I meant PST (not EST) :) >>>>>> > >>>> Looking forward to the discussion! >>>>>> > >>>> >>>>>> > >>>> On Tue, Feb 3, 2026 at 4:58 PM Shawn Chang <[email protected]> >>>>>> > >>>> wrote: >>>>>> > >>>> >>>>>> > >>>>> Hi Huaxin, >>>>>> > >>>>> >>>>>> > >>>>> Thanks for starting the sync! >>>>>> > >>>>> >>>>>> > >>>>> The meeting seems to be 9-10AM PST on the dev events calendar >>>>>> > >>>>> < >>>>>> https://calendar.google.com/calendar/u/0?cid=MzkwNWQ0OTJmMWI0NTBiYTA3MTJmMmFlNmFmYTc2ZWI3NTdmMTNkODUyMjBjYzAzYWE0NTI3ODg1YWRjNTYyOUBncm91cC5jYWxlbmRhci5nb29nbGUuY29t >>>>>> >, >>>>>> > >>>>> not EST. Maybe it's a typo? >>>>>> > >>>>> Otherwise, looking forward to the discussion! >>>>>> > >>>>> >>>>>> > >>>>> Best, >>>>>> > >>>>> Shawn >>>>>> > >>>>> >>>>>> > >>>>> On Tue, Feb 3, 2026 at 9:18 AM huaxin gao <[email protected]> >>>>>> > >>>>> wrote: >>>>>> > >>>>> >>>>>> > >>>>>> Hi all, >>>>>> > >>>>>> I'd like to start a dedicated sync to discuss Iceberg Index >>>>>> support. >>>>>> > >>>>>> Here is the existing discussion thread: >>>>>> > >>>>>> >>>>>> https://lists.apache.org/thread/fzqk3jjf0xpj5m4cfqb3v4c65p0t04ty. >>>>>> > >>>>>> >>>>>> > >>>>>> To ground the discussion, here are the two proposals: >>>>>> > >>>>>> >>>>>> > >>>>>> - Peter's proposal >>>>>> > >>>>>> < >>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2> >>>>>> (overall >>>>>> > >>>>>> index support) >>>>>> > >>>>>> - My proposal >>>>>> > >>>>>> < >>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7 >>>>>> > >>>>>> > >>>>>> (bloom filter skipping index) >>>>>> > >>>>>> >>>>>> > >>>>>> Time slot: Every 3 weeks, Wednesdays at 9 AM to 10 AM EST, >>>>>> starting >>>>>> > >>>>>> next Wednesday (2/11). After FileFormat sync finishes, we >>>>>> plan to use that >>>>>> > >>>>>> slot and switch to every other Monday, 9 AM to 10 AM EST. >>>>>> > >>>>>> >>>>>> > >>>>>> Meet link: https://meet.google.com/fjn-tyze-mko >>>>>> > >>>>>> >>>>>> > >>>>>> Thanks, >>>>>> > >>>>>> Huaxin >>>>>> > >>>>>> >>>>>> > >>>>> >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>
