[
https://issues.apache.org/jira/browse/SPARK-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Herman van Hovell resolved SPARK-17348.
---------------------------------------
Resolution: Fixed
Assignee: Nattavut Sutyanyong
Fix Version/s: 2.1.0
2.0.3
> Incorrect results from subquery transformation
> ----------------------------------------------
>
> Key: SPARK-17348
> URL: https://issues.apache.org/jira/browse/SPARK-17348
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Nattavut Sutyanyong
> Assignee: Nattavut Sutyanyong
> Labels: correctness
> Fix For: 2.0.3, 2.1.0
>
>
> {noformat}
> Seq((1,1)).toDF("c1","c2").createOrReplaceTempView("t1")
> Seq((1,1),(2,0)).toDF("c1","c2").createOrReplaceTempView("t2")
> sql("select c1 from t1 where c1 in (select max(t2.c1) from t2 where t1.c2 >=
> t2.c2)").show
> +---+
> | c1|
> +---+
> | 1|
> +---+
> {noformat}
> The correct result of the above query should be an empty set. Here is an
> explanation:
> Both rows from T2 satisfies the correlated predicate T1.C2 >= T2.C2 when
> T1.C1 = 1 so both rows needs to be processed in the same group of the
> aggregation process in the subquery. The result of the aggregation yields
> MAX(T2.C1) as 2. Therefore, the result of the evaluation of the predicate
> T1.C1 (which is 1) IN MAX(T2.C1) (which is 2) should be an empty set.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]