Hi Gabor, Thanks for your feedback!
> In that use case however, we'd lose the stats we got previously from HMS For Iceberg tables Hive computes and stores the same stats object in a puffin file, previously persisted to HMS. So, there shouldn't be any changes for Impala other than changing the stats source. > We could gather all the column stats needed by different engines, standardize > them into the Iceberg repo That is an option I mentioned above and provided the Hive schema, currently used to store column statistics. I can create a google doc to continue the discussion in that direction. > Aren't partition status just a more granular way of column stats. In Iceberg 1.7 Ajantha added a helper method to compute the basic partition stats for the given snapshot. Collection<PartitionStats> computeStats(Table table, Snapshot snapshot) Hopefully, we'll get reader and writer support in 1.8: https://github.com/apache/iceberg/pull/11216 A similar functionality is needed for column stats. In the case of a partitioned table, we need to create 1 ColumnStatistics object per partition and store it as a separate blob in a puffin file. During the query planning, we'll compute and use aggregated stats based on a pruned partition list. Regards, Denys