Hi Team, We have a Flink job where we would like to use the Iceberg File statistics (lowerBounds, upperBounds) during the planning phase.
Currently it is possible to parameterize the Scan to include the statistics using the includeColumnStats [1]. This is an on/off switch, but currently there is no way to configure this on a finer granularity. Sadly our table has plenty of columns and requesting statistics for every column will result in GenericDataFiles objects where the retained heap is ~100k each. We have a few thousand data files and requesting statistics for them would add serious extra memory load to our job. I was considering adding a new method to the Scan class like this: --------- ThisT includeColumnStats(Collection<String> columns); --------- Would the community consider this as a valuable addition to the Scan API? Thanks, Peter [1] https://github.com/apache/iceberg/blob/f536c840350bd5628d7c514d2a4719404c9b8ed1/api/src/main/java/org/apache/iceberg/Scan.java#L71-L78