[
https://issues.apache.org/jira/browse/SPARK-18614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703185#comment-15703185
]
Nattavut Sutyanyong commented on SPARK-18614:
---------------------------------------------
{{ExistenceJoin}} should be treated the same as {{LeftOuter}} and {{LeftAnti}},
not {{InnerLike}} and {{LeftSemi}}. This is not currently exposed because the
rewrite of {{\[NOT\] EXISTS OR ...}} to {{ExistenceJoin}} happens in rule
{{RewritePredicateSubquery}}, which is in a separate rule set and placed after
the rule {{PushPredicateThroughJoin}}. During the transformation in the rule
{{PushPredicateThroughJoin}}, an ExistenceJoin never exists.
The semantics of {{ExistenceJoin}} says we need to preserve all the rows from
the left table through the join operation as if it is a regular {{LeftOuter}}
join. The {{ExistenceJoin}} augments the {{LeftOuter}} operation with a new
column called {{exists}}, set to true when the join condition in the ON clause
is true and false otherwise. The filter of any rows will happen in the
{{Filter}} operation above the {{ExistenceJoin}}.
Example:
A(c1, c2): \{ (1, 1), (1, 2) \}
// B can be any value as it is irrelevant in this example
B(c1): \{ (NULL) \}
{code:SQL}
select A.*
from A
where exists (select 1 from B where A.c1 = A.c2)
or A.c2=2
{code}
In this example, the correct result is all the rows from A. If the pattern
{{ExistenceJoin}} at line 935 in {{Optimizer.scala}} added by the work in
SPARK-18597 is indeed active, the code will push down the predicate A.c1 = A.c2
to be a {{Filter}} on relation A, which will filter the row (1,2) from A.
> Incorrect predicate pushdown thru ExistenceJoin
> -----------------------------------------------
>
> Key: SPARK-18614
> URL: https://issues.apache.org/jira/browse/SPARK-18614
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Nattavut Sutyanyong
> Priority: Minor
> Labels: correctness
>
> This is a follow-up work from SPARK-18597 to close a potential incorrect
> rewrite in {{PushPredicateThroughJoin}} rule of the Optimizer phase.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]