jcsherin commented on issue #12955:
URL: https://github.com/apache/datafusion/issues/12955#issuecomment-2417351688
@vbarua Thanks. This bug report is well written.
```sql
DataFusion CLI v42.0.0
> EXPLAIN SELECT * FROM VALUES ('a'), ('b'), ('b'), ('c'), ('c'), ('c')
INTERSECT ALL
SELECT * FROM VALUES ('b'), ('b'), ('b'), ('c'), ('c');
+---------------+-----------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+-----------------------------------------------------------------------------------+
| logical_plan | LeftSemi Join: column1 = column1
|
| | Values: (Utf8("a")), (Utf8("b")), (Utf8("b")),
(Utf8("c")), (Utf8("c"))... |
| | Values: (Utf8("b")), (Utf8("b")), (Utf8("b")),
(Utf8("c")), (Utf8("c")) |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192
|
| | HashJoinExec: mode=Partitioned, join_type=LeftSemi,
on=[(column1@0, column1@0)] |
| | ValuesExec
|
| | ValuesExec
|
| |
|
+---------------+-----------------------------------------------------------------------------------+
```
The query generates a left semi-join plan and therefore will return only LHS
values. If RHS happens to have the minimum number of duplicates, then this
query will always return incorrect results.
- This bug is also related: #12956
```sql
DataFusion CLI v42.0.0
> EXPLAIN SELECT * FROM VALUES ('a'), ('b'), ('b'), ('c'), ('c'), ('c')
EXCEPT ALL
SELECT * FROM VALUES ('b'), ('b'), ('b'), ('c'), ('c');
+---------------+-----------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+-----------------------------------------------------------------------------------+
| logical_plan | LeftAnti Join: column1 = column1
|
| | Values: (Utf8("a")), (Utf8("b")), (Utf8("b")),
(Utf8("c")), (Utf8("c"))... |
| | Values: (Utf8("b")), (Utf8("b")), (Utf8("b")),
(Utf8("c")), (Utf8("c")) |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192
|
| | HashJoinExec: mode=Partitioned, join_type=LeftAnti,
on=[(column1@0, column1@0)] |
| | ValuesExec
|
| | ValuesExec
|
| |
|
+---------------+-----------------------------------------------------------------------------------+
```
Here the query generates a left anti-join. So it will always exclude rows
which match in RHS.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]