Dustin Cote created HIVE-13019:
----------------------------------

             Summary: Optimizer COLLECT_LIST/COLLECT_SET 
                 Key: HIVE-13019
                 URL: https://issues.apache.org/jira/browse/HIVE-13019
             Project: Hive
          Issue Type: Improvement
            Reporter: Dustin Cote
            Priority: Minor


Currently when using a COLLECT_SET/COLLECT_LIST that involves data from a 
single table, the aggregation is done after any JOIN operation that is present 
in the query.  For example:
{code}
insert into table nested_customers_orders
select c.*, collect_list(named_struct("oid", o.oid, "order_date": o.date...))
from customers c inner join orders o on (c.cid = o.oid)
group by o.oid, o.date,...
{code}

If we can tell the optimizer to perform the COLLECT_LIST first (where possible) 
we can see some performance gains in this pattern of query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to