asolimando opened a new issue, #22958:
URL: https://github.com/apache/datafusion/issues/22958

   ### Is your feature request related to a problem or challenge?
   
   The current `statistics_with_args` / `StatisticsArgs` design (#21815) embeds 
cache lookup
   and child traversal directly inside each operator's `statistics_with_args` 
override via `args.compute_child_statistics(...)`. This means:
   
   - Each operator must be aware of caching mechanics, coupling local 
propagation logic to the traversal strategy
   - Evolving the traversal or cache model requires touching every operator 
implementation
   
   ### Describe the solution you'd like
   
   Introduce a stateless `statistics_from_inputs` method on `ExecutionPlan` 
(defaulting to `Statistics::new_unknown`) that expresses only local propagation 
logic from pre-computed child statistics:
   
   ```rust
   fn statistics_from_inputs(
       &self,
       input_stats: &[Arc<Statistics>],
       partition: Option<usize>,
   ) -> Result<Arc<Statistics>> {
       Ok(Arc::new(Statistics::new_unknown(self.schema().as_ref())))
   }
   ```
   
   The external `StatisticsContext` owns traversal and cache management, 
calling `statistics_from_inputs` after resolving child statistics. 
`statistics_with_args` remains the public API and is unchanged.
   
   Benefits
   
   - Non-breaking: `statistics_from_inputs` has a safe default; 
`statistics_with_args` is unchanged
   - Operators that override `statistics_from_inputs` automatically benefit 
from any future
   improvements to the traversal/caching strategy without code changes
   - Operators become easier to test in isolation (no need to construct 
`StatisticsArgs` or
   a plan tree)
   
   ### Describe alternatives you've considered
   
   Keep the current `statistics_with_args` design as-is. Each operator handles 
caching via `args.compute_child_statistics(...)`. Works correctly but tightly 
couples propagation logic to traversal mechanics, making the cache model hard 
to evolve.
   
   ### Additional context
   
   Suggested by @2010YOUY01 in this 
[comment](https://github.com/apache/datafusion/pull/21815#issuecomment-4697341372)
 as a follow-up for #21815


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to