Hi, We at Starburst are looking into adding number distinct values (NDV) statistics to Iceberg tables, to let e.g. the Trino cost-based query optimizer produce better plans when working with Iceberg tables.
The initial approach is for table-level statistics, and may be improved in the future. I would appreciate feedback on the design doc https://docs.google.com/document/d/1we0BuQbbdqiJS2eUFC_-6TPSuO57GXivzKmcTzApivY This stats topic is related to Secondary Indexes, but we need slightly different terminology and mechanics for both. For example, indexes need to be exact, and properly invalidated. Statistics may be outdated and still useful, so these two things need to be coherent but separate. Best PF