Iceberg NDV stats

Piotr Findeisen Wed, 16 Mar 2022 03:04:53 -0700

Hi,

We at Starburst are looking into adding number distinct values (NDV)
statistics to Iceberg tables, to let e.g. the Trino cost-based query
optimizer produce better plans when working with Iceberg tables.


The initial approach is for table-level statistics, and may be improved in
the future.
I would appreciate feedback on the design doc
https://docs.google.com/document/d/1we0BuQbbdqiJS2eUFC_-6TPSuO57GXivzKmcTzApivY


This stats topic is related to Secondary Indexes, but we need slightly
different terminology and mechanics for both. For example, indexes need to
be exact, and properly invalidated. Statistics may be outdated and still
useful, so these two things need to be coherent but separate.

Best
PF

Iceberg NDV stats

Reply via email to