milenkovicm opened a new issue, #17358: URL: https://github.com/apache/datafusion/issues/17358
### Describe the bug `NestedLoopJoin` left side has to have single partition, there is a check for this: ```rust fn execute( &self, partition: usize, context: Arc<TaskContext>, ) -> Result<SendableRecordBatchStream> { if self.left.output_partitioning().partition_count() != 1 { return internal_err!( "Invalid NestedLoopJoinExec, the output partition count of the left child must be 1,\ consider using CoalescePartitionsExec or the EnforceDistribution rule" ); } ... } ``` but no check has been performed when join is created with `NestedLoopJoinExec::try_new` nor this is checked in `NestedLoopJoin::swap_inputs`. I'm not sure if it is expected that additional `PhysicalOptimizerRule` which will insert `CoalescePartitionsExec` automatically (like `EnforceDistribution`) or this is a bug in NLJ . ### To Reproduce _No response_ ### Expected behavior I believe this check should be performed when NLJ is created. Delaying check until `execute(...)` may be a bit too late. Also, `NLJ::swap_inputs` should fail to swap left and right input if it will break this rule Lastly, `JoinSelection` should not perform join re-order if right input has more than single partition. https://github.com/apache/datafusion/blob/fd7df66724f958a2d44ba1fda1b11dc6833f0296/datafusion/physical-optimizer/src/join_selection.rs#L305-L306 ### Additional context This issue is experienced in ballista running TPC-DS query 54, https://github.com/apache/datafusion-ballista/pull/1302#issuecomment-3239033243 Ballista performs join re-ordering based on shuffle file information, before stage runs. At that point all stages has been created so partitioning can't really change. Swapping join inputs and inserting `CoalescePartitionsExec` wont work for ballista. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org