kosiew opened a new pull request, #1216:
URL: https://github.com/apache/datafusion-python/pull/1216

   
   ## Which issue does this PR close?
   
   * Closes part of #1206
   
   ## Rationale for this change
   
   This change provides users with practical guidance and examples for tuning 
DataFusion’s parallelism to maximize CPU utilization. By documenting 
configuration options and including a benchmark script, users can better 
understand how to configure partitions and repartitioning to improve query 
performance.
   
   ## What changes are included in this PR?
   
   * Added a new **benchmark script** `benchmarks/max_cpu_usage.py` showing how 
to configure DataFusion for optimal parallelism and measure performance impact.
   * Updated **README.md** with a reference to the new documentation section.
   * Expanded **user guide** (`docs/source/user-guide/configuration.rst`) with 
a new section **Maximizing CPU Usage**, including:
   
     * Examples of tuning `SessionConfig` for higher partition counts.
     * Enabling automatic repartitioning for joins, aggregations, and window 
functions.
     * Manual repartitioning examples.
     * Benchmark usage instructions and performance comparison examples.
   
   ## Are these changes tested?
   
   The new `benchmarks/max_cpu_usage.py` script serves as a functional test and 
demonstration of configuration options. It generates synthetic data and 
measures query performance, showcasing partitioning impacts. While not a formal 
unit test, it validates correct behavior of partitioning and parallelism 
features.
   
   ## Are there any user-facing changes?
   
   Yes:
   
   * New documentation in the configuration guide explaining CPU usage 
optimization.
   * A new benchmark script available under `benchmarks/` for users to run and 
test parallelism configuration.
   
   No breaking API changes are introduced.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to