Dandandan commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r2036098003
########## datafusion/common/src/config.rs: ########## @@ -338,6 +338,19 @@ config_namespace! { /// if the source of statistics is accurate. /// We plan to make this the default in the future. pub use_row_number_estimates_to_optimize_partitioning: bool, default = false + + /// Should DataFusion use the the blocked approach to manage the groups + /// values and their related states in accumulators. By default, the single + /// approach will be used, values are managed within a single large block + /// (can think of it as a Vec). As this block grows, it often triggers + /// numerous copies, resulting in poor performance. + /// If setting this flag to `true`, the blocked approach will be used. + /// And the blocked approach allocates capacity for the block + /// based on a predefined block size firstly. When the block reaches its limit, + /// we allocate a new block (also with the same predefined block size based capacity) + // instead of expanding the current one and copying the data. + /// We plan to make this the default in the future when tests are enough. + pub enable_aggregation_intermediate_states_blocked_approach: bool, default = false Review Comment: Perfect! Let's close this one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org