Yin Huai created HIVE-3669: ------------------------------ Summary: Support queries in which input tables of correlated MR jobs involves intermediate tables Key: HIVE-3669 URL: https://issues.apache.org/jira/browse/HIVE-3669 Project: Hive Issue Type: Sub-task Reporter: Yin Huai
Correlation optimizer implemented in HIVE-2206 does not optimize correlated MapReduce jobs which have intermediate tables as input. Here is an example originally posted in HIVE-3430 {code:sql} select * from ( select c.value, count(1) as cnt from ( select b.key, b.value from ( select key, length(value) from T1 where ds = '1' ) a join T2 b on b.ds = '1' and a.key = b.key ) c group by c.value ) d join ( select value, count(1) as cnt from T2 c where c.ds = '1' group by value ) e on d.value = e.value; {code} Since correlated MapReduce jobs (those use "value" as the portioning key) involves an intermediate table "c", implementation of HIVE-2206 do not optimize this query. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira