[ 
https://issues.apache.org/jira/browse/HIVE-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815095#comment-13815095
 ] 

Sivaramakrishnan Narayanan commented on HIVE-784:
-------------------------------------------------

Haven't looked at the code yet - I'm commenting based on the design doc.

Comment on section 2.2.1 in the design doc.

One thing to be wary of the NOT-IN to LOJ with null check transformation is 
that it is only valid if the subquery's output is known to be non-nullable. 

Example, consider two tables:
||T1.x|
|1|
|2|
|null|

||T2.y|
|1|
|null|

Now consider this uncorrelated NOT-IN subquery:

{code}
select * from T1 where T1.x not in (select y from T2)
{code}

Should produce (if I remember my SQL semantics correctly), an empty result i.e. 
0 rows.

This is because "not in" is equivalent to <> ALL. If the inner subquery 
produces a null, then its comparison with anything always produces null. 
Therefore, <> ALL check fails for every x from T1. Therefore, the result is 
empty.

According to section 2.2.1, the transformation to LOJ with null check will look 
like this:

{code}
select * from T1 Left Outer Join (select y from T2) sq1 on (T1.x=sq1.y) where 
sq1.y is null
{code}

The LOJ will produce rows like this:
||T1.x|T2.y||
|1| 1|
|2| null|
|null| null|

Applying the null filter after the LOJ produces two rows. This is an incorrect 
result. If this case is already covered in your implementation, please ignore 
the comment.

> Support uncorrelated subqueries in the WHERE clause
> ---------------------------------------------------
>
>                 Key: HIVE-784
>                 URL: https://issues.apache.org/jira/browse/HIVE-784
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Ning Zhang
>            Assignee: Harish Butani
>             Fix For: 0.13.0
>
>         Attachments: D13443.1.patch, D13443.2.patch, HIVE-784.1.patch.txt, 
> HIVE-784.2.patch, SubQuerySpec.pdf, tpchQueriesUsingSubQueryClauses.sql
>
>
> Hive currently only support views in the FROM-clause, some Facebook use cases 
> suggest that Hive should support subqueries such as those connected by 
> IN/EXISTS in the WHERE-clause. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to