Matthew Weaver created HIVE-4697:
------------------------------------
Summary: Subqueries with IN and NOT IN
Key: HIVE-4697
URL: https://issues.apache.org/jira/browse/HIVE-4697
Project: Hive
Issue Type: New Feature
Components: Query Processor
Reporter: Matthew Weaver
Assignee: Matthew Weaver
h5. Functional Requirements
* Support {{WHERE x IN (<column subquery>);}}
** {{<column subquery>}} returns one column, any number of rows.
* Support {{WHERE x NOT IN (<column subquery>)}};
* Support same types of subqueries in {{HAVING}}.
** E.g.
{code:sql}
SELECT key FROM t1 GROUP BY key
HAVING COUNT(value) IN (SELECT p FROM t2); {code}
* Correlated subqueries not supported, for now at least
** But still need to check for correlation, and bail if it occurs.
** Correlated subquery:
*** A subquery that references a table that appears in a containing query
([MySQL|http://dev.mysql.com/doc/refman/5.7/en/correlated-subqueries.html]),
thus requiring subquery evaluation to look outside its scope.
*** The subquery depends on the outer query for its values, so the subquery
must be executed once for each row of the outer query. Also known as
_repeating Subqueries_.
h5. Tasks
* Rewrite {{IN (<column-subquery>)}} as a {{LEFT SEMI JOIN}}.
** Not ready for public consumption. In particular, no check for correlated
terms.
** With test queries.
* Add check for correlated terms, return informative error message.
* Rewrite {{WHERE NOT IN (<column-subquery>)}} as a {{LEFT OUTER JOIN}}.
** Return rows that don't match the right side
* Rewrite subqueries in {{HAVING}}, using {{LEFT SEMI JOIN}} and {{LEFT OUTER
JOIN}} as above.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira