Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

via GitHub Wed, 09 Apr 2025 13:29:24 -0700


Dandandan commented on code in PR #11943:
URL: https://github.com/apache/datafusion/pull/11943#discussion_r2036098003



##########
datafusion/common/src/config.rs:
##########
@@ -338,6 +338,19 @@ config_namespace! {
         /// if the source of statistics is accurate.
         /// We plan to make this the default in the future.
         pub use_row_number_estimates_to_optimize_partitioning: bool, default = 
false
+
+        /// Should DataFusion use the the blocked approach to manage the groups
+        /// values and their related states in accumulators. By default, the 
single
+        /// approach will be used, values are managed within a single large 
block
+        /// (can think of it as a Vec). As this block grows, it often triggers
+        /// numerous copies, resulting in poor performance.
+        /// If setting this flag to `true`, the blocked approach will be used.
+        /// And the blocked approach allocates capacity for the block
+        /// based on a predefined block size firstly. When the block reaches 
its limit,
+        /// we allocate a new block (also with the same predefined block size 
based capacity)
+        // instead of expanding the current one and copying the data.
+        /// We plan to make this the default in the future when tests are 
enough.
+        pub enable_aggregation_intermediate_states_blocked_approach: bool, 
default = false

Review Comment:
   Perfect! Let's close this one



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

Reply via email to