UBarney opened a new issue, #15601: URL: https://github.com/apache/datafusion/issues/15601
DataFusion's physical plan optimizer can generate plans containing a sequence of two RepartitionExec operators: `RepartitionExec(Hash)` directly consuming the output of `RepartitionExec(RoundRobinBatch)`. This pattern is observed, for example, when preparing inputs for a HashJoinExec, as shown in the following EXPLAIN output. I think `RepartitionExec(partitioning_scheme=RoundRobinBatch)` can be removed since `RepartitionExec(partitioning_scheme=hash)` support ` output_partition_count > input_partition_count` https://github.com/apache/datafusion/blob/6afd5397045840d728de02a5f148b66896b397b3/datafusion/physical-plan/src/repartition/mod.rs#L1150-L1157 ``` > explain SELECT a.value AS a_val, b.value AS b_val FROM generate_series(1, 10) a(value) JOIN generate_series(1, 10) b(value) ON a.value = b.value; +---------------+------------------------------------------------------------+ | plan_type | plan | +---------------+------------------------------------------------------------+ | physical_plan | ┌───────────────────────────┐ | | | │ ProjectionExec │ | | | │ -------------------- │ | | | │ a_val: value │ | | | │ b_val: value │ | | | │ │ | | | │ partition_cont: 24 │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ CoalesceBatchesExec │ | | | │ -------------------- │ | | | │ target_batch_size: │ | | | │ 8192 │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ HashJoinExec │ | | | │ -------------------- ├──────────────┐ | | | │ on: (value = value) │ │ | | | └─────────────┬─────────────┘ │ | | | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | | | │ CoalesceBatchesExec ││ CoalesceBatchesExec │ | | | │ -------------------- ││ -------------------- │ | | | │ target_batch_size: ││ target_batch_size: │ | | | │ 8192 ││ 8192 │ | | | └─────────────┬─────────────┘└─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | | | │ RepartitionExec ││ RepartitionExec │ | | | │ -------------------- ││ -------------------- │ | | | │ output_partition_count: ││ output_partition_count: │ | | | │ 24 ││ 24 │ | | | │ ││ │ | | | │ partitioning_scheme: ││ partitioning_scheme: │ | | | │ Hash([value@0], 24) ││ Hash([value@0], 24) │ | | | └─────────────┬─────────────┘└─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | | | │ RepartitionExec ││ RepartitionExec │ | | | │ -------------------- ││ -------------------- │ | | | │ output_partition_count: ││ output_partition_count: │ | | | │ 1 ││ 1 │ | | | │ ││ │ | | | │ partitioning_scheme: ││ partitioning_scheme: │ | | | │ RoundRobinBatch(24) ││ RoundRobinBatch(24) │ | | | └─────────────┬─────────────┘└─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | | | │ LazyMemoryExec ││ LazyMemoryExec │ | | | │ -------------------- ││ -------------------- │ | | | │ batch_generators: ││ batch_generators: │ | | | │ generate_series: start=1, ││ generate_series: start=1, │ | | | │ end=10, batch_size=8192 ││ end=10, batch_size=8192 │ | | | └───────────────────────────┘└───────────────────────────┘ | | | | +---------------+------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.002 seconds. ``` code that add `RepartitionExec`: https://github.com/apache/datafusion/blob/6afd5397045840d728de02a5f148b66896b397b3/datafusion/physical-optimizer/src/enforce_distribution.rs#L1260-L1271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org