milenkovicm opened a new issue, #17358:
URL: https://github.com/apache/datafusion/issues/17358

   ### Describe the bug
   
   `NestedLoopJoin` left side has to have single partition, there is a check 
for this:
   
   ```rust
      fn execute(
           &self,
           partition: usize,
           context: Arc<TaskContext>,
       ) -> Result<SendableRecordBatchStream> {
           if self.left.output_partitioning().partition_count() != 1 {
               return internal_err!(
                   "Invalid NestedLoopJoinExec, the output partition count of 
the left child must be 1,\
                    consider using CoalescePartitionsExec or the 
EnforceDistribution rule"
               );
           }
   ...
   }
   ```
   but no check has been performed when join is created with 
`NestedLoopJoinExec::try_new` nor this is checked in 
`NestedLoopJoin::swap_inputs`. 
   
   I'm not sure if it is expected that additional `PhysicalOptimizerRule` which 
will insert `CoalescePartitionsExec` automatically (like `EnforceDistribution`) 
or this is a bug in NLJ .
   
   ### To Reproduce
   
   _No response_
   
   ### Expected behavior
   
   I believe this check should be performed when NLJ is created. Delaying check 
until `execute(...)` may be a bit too late. 
   
   Also, `NLJ::swap_inputs` should fail to swap left and right input if it will 
break this rule
   
   Lastly, `JoinSelection` should not perform join re-order if right input has 
more than single partition. 
   
https://github.com/apache/datafusion/blob/fd7df66724f958a2d44ba1fda1b11dc6833f0296/datafusion/physical-optimizer/src/join_selection.rs#L305-L306
   
   
   ### Additional context
   
   This issue is experienced in ballista running TPC-DS query 54, 
https://github.com/apache/datafusion-ballista/pull/1302#issuecomment-3239033243
   
   Ballista performs join re-ordering based on shuffle file information, before 
stage runs. 
   At that point all stages has been created so partitioning can't really 
change. Swapping join inputs and inserting `CoalescePartitionsExec` wont work 
for ballista. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to