alamb commented on PR #15503: URL: https://github.com/apache/datafusion/pull/15503#issuecomment-2787539880
I think the current API in this PR (returning `PartitionedStatistics`) is nice from the point of view that it provides the finest granularity access to statistics (provides per partition information that can be summarized when desired). I do agree it is unfortunate that `ExecutionPlan::statistics` and `ExecutionPlan::partitioned_statistics` may end up being redundant. However, I think changing the signature of `ExecutionPlan::statistics` will just be that much more disruptive. So what I personally suggest is add `ExecutionPlan::partitioned_statistics` as in this PR, but work to reduce the duplication. I think we could add a default implementation of `statistics` that just called through to `partitioned-statistics` which would avoid the duplication. Something like ```rust pub trait ExecutionPlan { ... fn statistics(&self) -> Result<Statistics> { self.partitioned_statistics().combine() } ... ```` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org