adriangb commented on code in PR #16641: URL: https://github.com/apache/datafusion/pull/16641#discussion_r2180250971
########## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ########## @@ -668,6 +668,15 @@ fn handle_hash_join( plan: &HashJoinExec, parent_required: OrderingRequirements, ) -> Result<Option<Vec<Option<OrderingRequirements>>>> { + // Anti-joins (LeftAnti or RightAnti) do not preserve meaningful input order, + // so sorting beforehand cannot be relied on. Bail out early for both flavors: + match plan.join_type() { + JoinType::LeftAnti | JoinType::RightAnti => { + return Ok(None); + } + _ => {} + } Review Comment: amazing! > you can't push down a a limit below any operator that changes the cardinality of the data one question here: can we push down operators that have a limit (`LIMIT` itself but also `TopK`) past other operators that change cardinality? e.g. `HashJoinExec`? I feel like maybe what we should do is call `ExecutionPlan::cardinality_effect` on the plan we are pushing down and the plan we are pushing through and only allow the pushdown if one of them is `CardinalityEffect::Equal` (I think that'd be correct). In the case of `HashJoinExec` + `TopK` you'd have `Unknown` and `LowerEqual` so you can't push down. `HashJoinExec` + non TopK `SortExec` would have `Unknown` and `Equal` so you can push down. This would generalize to to handle pushing through a `FilterExec`, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org