alamb commented on PR #15981:
URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2867285850

   > I think something like that is done already in the "convert to state" 
logic - it will dynamically decide to skip aggregating once it sees that the 
group vs input rows ratio is small.
   
   I agree
   
   Specifically 
https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.GroupsAccumulator.html#method.convert_to_state
 and similar functions
   
   These [config value thresholds 
](https://datafusion.apache.org/user-guide/configs.html)control the behavior:
   
   
   datafusion.execution.skip_partial_aggregation_probe_ratio_threshold | 0.8 | 
Aggregation ratio (number of distinct groups / number of input rows) threshold 
for skipping partial aggregation. If the value is greater then partial 
aggregation will skip aggregation for further input
   -- | -- | --
   datafusion.execution.skip_partial_aggregation_probe_rows_threshold | 100000 
| Number of input rows partial aggregation partition should process, before 
aggregation ratio check and trying to switch to skipping aggregation mode
   datafusion.execution.use_row_number_estimates_to_optimize_partitioning | 
false | Should DataFusion use row number estimates at the input to decide 
whether increasing parallelism is beneficial or not. By default, only exact row 
numbers (not estimates) are used for this decision. Setting this flag to true 
will likely produce better plans. if the source of statistics is accurate. We 
plan to make this the default in the future.
    
   
   <br class="Apple-interchange-newline">


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to