[ https://issues.apache.org/jira/browse/HIVE-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815095#comment-13815095 ]
Sivaramakrishnan Narayanan commented on HIVE-784: ------------------------------------------------- Haven't looked at the code yet - I'm commenting based on the design doc. Comment on section 2.2.1 in the design doc. One thing to be wary of the NOT-IN to LOJ with null check transformation is that it is only valid if the subquery's output is known to be non-nullable. Example, consider two tables: ||T1.x| |1| |2| |null| ||T2.y| |1| |null| Now consider this uncorrelated NOT-IN subquery: {code} select * from T1 where T1.x not in (select y from T2) {code} Should produce (if I remember my SQL semantics correctly), an empty result i.e. 0 rows. This is because "not in" is equivalent to <> ALL. If the inner subquery produces a null, then its comparison with anything always produces null. Therefore, <> ALL check fails for every x from T1. Therefore, the result is empty. According to section 2.2.1, the transformation to LOJ with null check will look like this: {code} select * from T1 Left Outer Join (select y from T2) sq1 on (T1.x=sq1.y) where sq1.y is null {code} The LOJ will produce rows like this: ||T1.x|T2.y|| |1| 1| |2| null| |null| null| Applying the null filter after the LOJ produces two rows. This is an incorrect result. If this case is already covered in your implementation, please ignore the comment. > Support uncorrelated subqueries in the WHERE clause > --------------------------------------------------- > > Key: HIVE-784 > URL: https://issues.apache.org/jira/browse/HIVE-784 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Ning Zhang > Assignee: Harish Butani > Fix For: 0.13.0 > > Attachments: D13443.1.patch, D13443.2.patch, HIVE-784.1.patch.txt, > HIVE-784.2.patch, SubQuerySpec.pdf, tpchQueriesUsingSubQueryClauses.sql > > > Hive currently only support views in the FROM-clause, some Facebook use cases > suggest that Hive should support subqueries such as those connected by > IN/EXISTS in the WHERE-clause. -- This message was sent by Atlassian JIRA (v6.1#6144)