2010YOUY01 commented on code in PR #15610: URL: https://github.com/apache/datafusion/pull/15610#discussion_r2041368520
########## datafusion/common/src/config.rs: ########## @@ -337,6 +337,13 @@ config_namespace! { /// batches and merged. pub sort_in_place_threshold_bytes: usize, default = 1024 * 1024 + /// When doing external sorting, the maximum number of spilled files to + /// read back at once. Those read files in the same merge step will be sort- + /// preserving-merged and re-spilled, and the step will be repeated to reduce + /// the number of spilled files in multiple passes, until a final sorted run + /// can be produced. + pub sort_max_spill_merge_degree: usize, default = 16 Review Comment: > The reason why I'm picky about this is that it is a new configuration that will be hard to deprecate or change This is a solid point, this option is intended to be manually set, and it has to ensure `(max_batch_size * per_partition_merge_degree * partition_count) < total_memory_limit`. If it's set correctly for a query, then the query should succeed. The problem is the ever-growing number of configurations in DataFusion, and it seems impossible to set them all correctly. Enabling parallel merging optimization would require introducing yet another configuration, I'm also trying to avoid that (though too-many-configs problem might be a harsh reality we must accept). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org