kosiew opened a new pull request, #1216: URL: https://github.com/apache/datafusion-python/pull/1216
## Which issue does this PR close? * Closes part of #1206 ## Rationale for this change This change provides users with practical guidance and examples for tuning DataFusion’s parallelism to maximize CPU utilization. By documenting configuration options and including a benchmark script, users can better understand how to configure partitions and repartitioning to improve query performance. ## What changes are included in this PR? * Added a new **benchmark script** `benchmarks/max_cpu_usage.py` showing how to configure DataFusion for optimal parallelism and measure performance impact. * Updated **README.md** with a reference to the new documentation section. * Expanded **user guide** (`docs/source/user-guide/configuration.rst`) with a new section **Maximizing CPU Usage**, including: * Examples of tuning `SessionConfig` for higher partition counts. * Enabling automatic repartitioning for joins, aggregations, and window functions. * Manual repartitioning examples. * Benchmark usage instructions and performance comparison examples. ## Are these changes tested? The new `benchmarks/max_cpu_usage.py` script serves as a functional test and demonstration of configuration options. It generates synthetic data and measures query performance, showcasing partitioning impacts. While not a formal unit test, it validates correct behavior of partitioning and parallelism features. ## Are there any user-facing changes? Yes: * New documentation in the configuration guide explaining CPU usage optimization. * A new benchmark script available under `benchmarks/` for users to run and test parallelism configuration. No breaking API changes are introduced. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org