[ https://issues.apache.org/jira/browse/HIVE-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anand Sridharan reassigned HIVE-9495: ------------------------------------- Assignee: Anand Sridharan > Map Side aggregation affecting map performance > ---------------------------------------------- > > Key: HIVE-9495 > URL: https://issues.apache.org/jira/browse/HIVE-9495 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.14.0 > Environment: RHEL 6.4 > Hortonworks Hadoop 2.2 > Reporter: Anand Sridharan > Assignee: Anand Sridharan > Attachments: HIVE-9495.1.patch.txt, HIVE-9495.2.patch.txt, > profiler_screenshot.PNG > > > When trying to run a simple aggregation query with hive.map.aggr=true, map > tasks take a lot of time in Hive 0.14 as against with hive.map.aggr=false. > e.g. > Consider the query: > {code} > INSERT OVERWRITE TABLE lineitem_tgt_agg > select alias.a0 as a0, > alias.a2 as a1, > alias.a1 as a2, > alias.a3 as a3, > alias.a4 as a4 > from ( > select alias.a0 as a0, > SUM(alias.a1) as a1, > SUM(alias.a2) as a2, > SUM(alias.a3) as a3, > SUM(alias.a4) as a4 > from ( > select lineitem_sf500.l_orderkey as a0, > CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * (1 - > lineitem_sf500.l_discount) * (1 + lineitem_sf500.l_tax) as double) as a1, > lineitem_sf500.l_quantity as a2, > CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * > lineitem_sf500.l_discount as double) as a3, > CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * > lineitem_sf500.l_tax as double) as a4 > from lineitem_sf500 > ) alias > group by alias.a0 > ) alias; > {code} > The above query was run with ~376GB of data / ~3billion records in the source. > It takes ~10 minutes with hive.map.aggr=false. > With map side aggregation set to true, the map tasks don't complete even > after an hour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)