UBarney commented on issue #15601: URL: https://github.com/apache/datafusion/issues/15601#issuecomment-2782098214
> `tpch_sf1` And `tpch_sf10` by default already partition the input data, so AFAIK the plans should not be any different (they don't introduce round-robin repartition) That's not entirely accurate. At least for `tpch_sf1`, the plan includes a pattern like `hash_partition -> round_robin`, as seen in [q2_run_with_sf1](https://gist.github.com/UBarney/1fa47ed8bf043a88b646c1bc466994cb#file-gistfile1-txt-L255-L275). In contrast, `tpch_sf10` may avoid inserting a `RoundRobinBatch` followed by a `HashPartitioning`. I add this line ```patch --- a/datafusion/physical-optimizer/src/enforce_distribution.rs +++ b/datafusion/physical-optimizer/src/enforce_distribution.rs @@ -1262,6 +1262,7 @@ pub fn ensure_distribution( // Add round-robin repartitioning on top of the operator // to increase parallelism. child = add_roundrobin_on_top(child, target_partitions)?; + println!("add rr"); } // When inserting hash is necessary to satisfy hash requirement, insert hash repartition. if hash_necessary { ``` [tpch_sf1_output](https://gist.githubusercontent.com/UBarney/1892acbec803e9b09230b6679524ddb3/raw/f62713979ee6ef1d37db1217d01a6cc108d804b4/gistfile1.txt) have some `add rr` in oputput [tpch_sf10_output](https://gist.githubusercontent.com/UBarney/f97a0f30809fb10d484939f54f474926/raw/e94a3bba78012bf66c2e519a0b4ee222c0612926/gistfile1.txt) have no `add rr` in oputput -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org