zhuqi-lucas commented on code in PR #16641: URL: https://github.com/apache/datafusion/pull/16641#discussion_r2178834573
########## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ########## @@ -668,6 +668,15 @@ fn handle_hash_join( plan: &HashJoinExec, parent_required: OrderingRequirements, ) -> Result<Option<Vec<Option<OrderingRequirements>>>> { + // Anti-joins (LeftAnti or RightAnti) do not preserve meaningful input order, + // so sorting beforehand cannot be relied on. Bail out early for both flavors: + match plan.join_type() { + JoinType::LeftAnti | JoinType::RightAnti => { + return Ok(None); + } + _ => {} + } Review Comment: Thank you @adriangb for review: Good point—at first glance it looks like RightAnti would simply fall out at the !maintains_input_order()[1] check below, but in reality we need to short‑circuit before that. The maintains_input_order method is merely telling us “if you streamed all probe‑side matches you’d get them in input order,” which is true of a RightAnti join (it does process probe rows in sequence). But an anti‑join also drops rows, and that drop can interleave arbitrarily with the sort key, so a pre‑sort + anti‑join does not guarantee globally sorted output. In other words: maintains_input_order()[1] == true means “if you output every matching probe row, you’d preserve order.” Anti‑joins however filter out some rows, so you cannot rely on that to still produce a correctly sorted subset. That’s why we explicitly bail out on both LeftAnti and RightAnti before ever calling maintains_input_order—we need to prevent any sort‑pushdown for anti‑joins, even though their raw probe phase might look ordered. And since maintains_input_order is used elsewhere for true order‑preserving cases, we keep it unchanged here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org