Yin Huai created HIVE-3669:
------------------------------

             Summary: Support queries in which input tables of correlated MR 
jobs involves intermediate tables
                 Key: HIVE-3669
                 URL: https://issues.apache.org/jira/browse/HIVE-3669
             Project: Hive
          Issue Type: Sub-task
            Reporter: Yin Huai


Correlation optimizer implemented in HIVE-2206 does not optimize correlated 
MapReduce jobs which have intermediate tables as input.

Here is an example originally posted in HIVE-3430
{code:sql}
select * from
(
  select c.value, count(1) as cnt from
  (
    select b.key, b.value from
    (
      select key, length(value) from T1 where ds = '1'
    ) a
    join
    T2 b on b.ds = '1' and a.key = b.key
  ) c
  group by c.value
) d
join
(
  select value, count(1) as cnt from T2 c where c.ds = '1' group by value
) e
on d.value = e.value;
{code}
Since correlated MapReduce jobs (those use "value" as the portioning key) 
involves an intermediate table "c", implementation of HIVE-2206 do not optimize 
this query.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to