zhuqi-lucas commented on code in PR #16641:
URL: https://github.com/apache/datafusion/pull/16641#discussion_r2178834573


##########
datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs:
##########
@@ -668,6 +668,15 @@ fn handle_hash_join(
     plan: &HashJoinExec,
     parent_required: OrderingRequirements,
 ) -> Result<Option<Vec<Option<OrderingRequirements>>>> {
+    // Anti-joins (LeftAnti or RightAnti) do not preserve meaningful input 
order,
+    // so sorting beforehand cannot be relied on. Bail out early for both 
flavors:
+    match plan.join_type() {
+        JoinType::LeftAnti | JoinType::RightAnti => {
+            return Ok(None);
+        }
+        _ => {}
+    }

Review Comment:
   Thank you @adriangb for review:
   
   Good point—at first glance it looks like RightAnti would simply fall out at 
the !maintains_input_order()[1] check below, but in reality we need to 
short‑circuit before that.
   
   The maintains_input_order method is merely telling us “if you streamed all 
probe‑side matches you’d get them in input order,” which is true of a RightAnti 
join (it does process probe rows in sequence). But an anti‑join also drops 
rows, and that drop can interleave arbitrarily with the sort key, so a pre‑sort 
+ anti‑join does not guarantee globally sorted output.
   
   In other words:
   
   maintains_input_order()[1] == true means “if you output every matching probe 
row, you’d preserve order.”
   
   Anti‑joins however filter out some rows, so you cannot rely on that to still 
produce a correctly sorted subset.
   
   That’s why we explicitly bail out on both LeftAnti and RightAnti before ever 
calling maintains_input_order—we need to prevent any sort‑pushdown for 
anti‑joins, even though their raw probe phase might look ordered.
   
   And since maintains_input_order is used elsewhere for true order‑preserving 
cases, we keep it unchanged here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to