Thanks All for the reactions.

I wanted to emphasize that Hive's StatsObject was shared as an example with the 
suggestion to adapt it for iceberg - `PartitionColumnStats` (i.e. use column 
ids and drop name/type, etc).

As was mentioned by Rayan, column upper/lower bounds, counts, null value and 
NaN counts are tracked at file level. 
For the partition aggregates, we still need some compound object that could 
provide all values at once, precomputed, similar to basic stats - 
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/PartitionStats.java

WDYT?

Reply via email to