adriangb commented on code in PR #16641:
URL: https://github.com/apache/datafusion/pull/16641#discussion_r2178887207


##########
datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs:
##########
@@ -668,6 +668,15 @@ fn handle_hash_join(
     plan: &HashJoinExec,
     parent_required: OrderingRequirements,
 ) -> Result<Option<Vec<Option<OrderingRequirements>>>> {
+    // Anti-joins (LeftAnti or RightAnti) do not preserve meaningful input 
order,
+    // so sorting beforehand cannot be relied on. Bail out early for both 
flavors:
+    match plan.join_type() {
+        JoinType::LeftAnti | JoinType::RightAnti => {
+            return Ok(None);
+        }
+        _ => {}
+    }

Review Comment:
   Okay interesting. Thank you for the in-depth explenation.
   
   Could you show with a simple table representing a batch how the anti join 
filtering out rows causes ordering to be lost? I imagine something like this 
flowing into the anti join:
   
   | a | b | c |
   |---|---|---|
   | 1 | 1 | 2 |
   | 2 | 3 | 5 |
   | 2 | 4 | 1 |
   
   
   For the query `SELECT a, b, c FROM t1 WHERE c NOT IN (SELECT n FROM t2) 
ORDER BY t1.a, t2.b` I expect the anti join to be created and for it to remove 
some rows, let's say `SELECT n FROM t2` returns just `5`, then the output would 
be:
   
   | a | b | c |
   |---|---|---|
   | 1 | 1 | 2 |
   | 2 | 4 | 1 |
   
   Which is still ordered. Are you saying it might be
   
   | a | b | c |
   |---|---|---|
   | 2 | 4 | 1 |
   | 1 | 1 | 2 |
   
   instead?



##########
datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs:
##########
@@ -668,6 +668,15 @@ fn handle_hash_join(
     plan: &HashJoinExec,
     parent_required: OrderingRequirements,
 ) -> Result<Option<Vec<Option<OrderingRequirements>>>> {
+    // Anti-joins (LeftAnti or RightAnti) do not preserve meaningful input 
order,
+    // so sorting beforehand cannot be relied on. Bail out early for both 
flavors:
+    match plan.join_type() {
+        JoinType::LeftAnti | JoinType::RightAnti => {
+            return Ok(None);
+        }
+        _ => {}
+    }

Review Comment:
   Okay interesting. Thank you for the in-depth explanation.
   
   Could you show with a simple table representing a batch how the anti join 
filtering out rows causes ordering to be lost? I imagine something like this 
flowing into the anti join:
   
   | a | b | c |
   |---|---|---|
   | 1 | 1 | 2 |
   | 2 | 3 | 5 |
   | 2 | 4 | 1 |
   
   
   For the query `SELECT a, b, c FROM t1 WHERE c NOT IN (SELECT n FROM t2) 
ORDER BY t1.a, t2.b` I expect the anti join to be created and for it to remove 
some rows, let's say `SELECT n FROM t2` returns just `5`, then the output would 
be:
   
   | a | b | c |
   |---|---|---|
   | 1 | 1 | 2 |
   | 2 | 4 | 1 |
   
   Which is still ordered. Are you saying it might be
   
   | a | b | c |
   |---|---|---|
   | 2 | 4 | 1 |
   | 1 | 1 | 2 |
   
   instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to