UBarney opened a new issue, #15601:
URL: https://github.com/apache/datafusion/issues/15601

   DataFusion's physical plan optimizer can generate plans containing a 
sequence of two RepartitionExec operators: `RepartitionExec(Hash)` directly 
consuming the output of `RepartitionExec(RoundRobinBatch)`. This pattern is 
observed, for example, when preparing inputs for a HashJoinExec, as shown in 
the following EXPLAIN output.
   I think `RepartitionExec(partitioning_scheme=RoundRobinBatch)` can be 
removed since `RepartitionExec(partitioning_scheme=hash)` support ` 
output_partition_count > input_partition_count` 
   
https://github.com/apache/datafusion/blob/6afd5397045840d728de02a5f148b66896b397b3/datafusion/physical-plan/src/repartition/mod.rs#L1150-L1157
   ```
   > explain SELECT a.value AS a_val, b.value AS b_val 
   FROM generate_series(1, 10) a(value)
   JOIN generate_series(1, 10) b(value) ON a.value = b.value;
   
+---------------+------------------------------------------------------------+
   | plan_type     | plan                                                       
|
   
+---------------+------------------------------------------------------------+
   | physical_plan | ┌───────────────────────────┐                              
|
   |               | │       ProjectionExec      │                              
|
   |               | │    --------------------   │                              
|
   |               | │        a_val: value       │                              
|
   |               | │        b_val: value       │                              
|
   |               | │                           │                              
|
   |               | │     partition_cont: 24    │                              
|
   |               | └─────────────┬─────────────┘                              
|
   |               | ┌─────────────┴─────────────┐                              
|
   |               | │    CoalesceBatchesExec    │                              
|
   |               | │    --------------------   │                              
|
   |               | │     target_batch_size:    │                              
|
   |               | │            8192           │                              
|
   |               | └─────────────┬─────────────┘                              
|
   |               | ┌─────────────┴─────────────┐                              
|
   |               | │        HashJoinExec       │                              
|
   |               | │    --------------------   ├──────────────┐               
|
   |               | │    on: (value = value)    │              │               
|
   |               | └─────────────┬─────────────┘              │               
|
   |               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ 
|
   |               | │    CoalesceBatchesExec    ││    CoalesceBatchesExec    │ 
|
   |               | │    --------------------   ││    --------------------   │ 
|
   |               | │     target_batch_size:    ││     target_batch_size:    │ 
|
   |               | │            8192           ││            8192           │ 
|
   |               | └─────────────┬─────────────┘└─────────────┬─────────────┘ 
|
   |               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ 
|
   |               | │      RepartitionExec      ││      RepartitionExec      │ 
|
   |               | │    --------------------   ││    --------------------   │ 
|
   |               | │  output_partition_count:  ││  output_partition_count:  │ 
|
   |               | │             24            ││             24            │ 
|
   |               | │                           ││                           │ 
|
   |               | │    partitioning_scheme:   ││    partitioning_scheme:   │ 
|
   |               | │    Hash([value@0], 24)    ││    Hash([value@0], 24)    │ 
|
   |               | └─────────────┬─────────────┘└─────────────┬─────────────┘ 
|
   |               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ 
|
   |               | │      RepartitionExec      ││      RepartitionExec      │ 
|
   |               | │    --------------------   ││    --------------------   │ 
|
   |               | │  output_partition_count:  ││  output_partition_count:  │ 
|
   |               | │             1             ││             1             │ 
|
   |               | │                           ││                           │ 
|
   |               | │    partitioning_scheme:   ││    partitioning_scheme:   │ 
|
   |               | │    RoundRobinBatch(24)    ││    RoundRobinBatch(24)    │ 
|
   |               | └─────────────┬─────────────┘└─────────────┬─────────────┘ 
|
   |               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ 
|
   |               | │       LazyMemoryExec      ││       LazyMemoryExec      │ 
|
   |               | │    --------------------   ││    --------------------   │ 
|
   |               | │     batch_generators:     ││     batch_generators:     │ 
|
   |               | │ generate_series: start=1, ││ generate_series: start=1, │ 
|
   |               | │   end=10, batch_size=8192 ││   end=10, batch_size=8192 │ 
|
   |               | └───────────────────────────┘└───────────────────────────┘ 
|
   |               |                                                            
|
   
+---------------+------------------------------------------------------------+
   1 row(s) fetched. 
   Elapsed 0.002 seconds.
   ```
   code that add `RepartitionExec`:  
https://github.com/apache/datafusion/blob/6afd5397045840d728de02a5f148b66896b397b3/datafusion/physical-optimizer/src/enforce_distribution.rs#L1260-L1271


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to