Jesus Camacho Rodriguez created HIVE-17043:
----------------------------------------------

             Summary: Remove non unique columns from group by keys if not 
referenced later
                 Key: HIVE-17043
                 URL: https://issues.apache.org/jira/browse/HIVE-17043
             Project: Hive
          Issue Type: Sub-task
          Components: Logical Optimizer
    Affects Versions: 3.0.0
            Reporter: Ashutosh Chauhan


Group by keys may be a mix of unique (or primary) keys and regular columns. In 
such cases presence of regular column won't alter cardinality of groups. So, if 
regular columns are not referenced later, they can be dropped from group by 
keys. Depending on operator tree may result in those columns not being read at 
all from disk in best case. In worst case, we will avoid shuffling and sorting 
regular columns from mapper to reducer, which still could be substantial CPU 
and network savings.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to