alamb commented on PR #15503:
URL: https://github.com/apache/datafusion/pull/15503#issuecomment-2787539880

   I think the current API in this PR (returning `PartitionedStatistics`) is 
nice from the point of view that it provides the finest granularity access to 
statistics (provides per partition information that can be summarized when 
desired). 
   
   I do agree it is unfortunate that `ExecutionPlan::statistics` and 
`ExecutionPlan::partitioned_statistics` may end up being redundant. 
   
   However, I think changing the signature of `ExecutionPlan::statistics` will 
just be that much more disruptive. 
   
   So what I personally suggest is add `ExecutionPlan::partitioned_statistics` 
as in this PR, but work to reduce the duplication. I think we could add a 
default implementation of `statistics` that just called through to 
`partitioned-statistics` which would avoid the duplication. Something like
   
   ```rust
   pub trait ExecutionPlan { 
   ...
      fn statistics(&self) -> Result<Statistics> {
         self.partitioned_statistics().combine()
       }
   ...
   ````
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to