zhuqi-lucas commented on code in PR #16641: URL: https://github.com/apache/datafusion/pull/16641#discussion_r2180355482
########## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ########## @@ -668,6 +668,15 @@ fn handle_hash_join( plan: &HashJoinExec, parent_required: OrderingRequirements, ) -> Result<Option<Vec<Option<OrderingRequirements>>>> { + // Anti-joins (LeftAnti or RightAnti) do not preserve meaningful input order, + // so sorting beforehand cannot be relied on. Bail out early for both flavors: + match plan.join_type() { + JoinType::LeftAnti | JoinType::RightAnti => { + return Ok(None); + } + _ => {} + } Review Comment: > amazing! > > > you can't push down a a limit below any operator that changes the cardinality of the data > > one question here: can we push down operators that have a limit (`LIMIT` itself but also `TopK`) past other operators that change cardinality? e.g. `HashJoinExec`? I feel like maybe what we should do is call `ExecutionPlan::cardinality_effect` on the plan we are pushing down and the plan we are pushing through and only allow the pushdown if one of them is `CardinalityEffect::Equal` (I think that'd be correct). In the case of `HashJoinExec` + `TopK` you'd have `Unknown` and `LowerEqual` so you can't push down. `HashJoinExec` + non TopK `SortExec` would have `Unknown` and `Equal` so you can push down. This would generalize to to handle pushing through a `FilterExec`, etc. Great point! Try to address it in latest PR, and it seems some slt files changes, but it should be right, the unionexec also should not pass the TopK i believe. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org