Hi
Im just wondering, is a solution to put these stats in Puffin files?
There's already ComputeTableStatsSparkAction (and probably similar actions
in other engines), and I can imagine a quick metadata aggregation job to
compute min/max/null_values, etc. Also how accurate would we need the
stats? T
Thanks Anton for the review and feedback. I've shared more context in the
below. Good to learn about the potential manifest structure change in V4. I
guess this proposal is more helpful in terms of stating the problem of
large-scale manifest processing. I think we can think of ways to improve
that
Does the doc suggest it is too expensive to aggregate min/max stats after
planning files (i.e. after loading matching files in memory)? Do we have
any benchmarks to refer to? We will have to read manifests for planning
anyway, right?
Also, the doc proposes to add column level stats to the manifest