[ https://issues.apache.org/jira/browse/HIVE-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334576#comment-14334576 ]
Hive QA commented on HIVE-9495: ------------------------------- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12700340/HIVE-9495.2.patch.txt {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 7566 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_lateral_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_json_tuple org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_parse_url_tuple org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_lateral_view {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2854/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2854/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2854/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12700340 - PreCommit-HIVE-TRUNK-Build > Map Side aggregation affecting map performance > ---------------------------------------------- > > Key: HIVE-9495 > URL: https://issues.apache.org/jira/browse/HIVE-9495 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.14.0 > Environment: RHEL 6.4 > Hortonworks Hadoop 2.2 > Reporter: Anand Sridharan > Attachments: HIVE-9495.1.patch.txt, HIVE-9495.2.patch.txt, > profiler_screenshot.PNG > > > When trying to run a simple aggregation query with hive.map.aggr=true, map > tasks take a lot of time in Hive 0.14 as against with hive.map.aggr=false. > e.g. > Consider the query: > {code} > INSERT OVERWRITE TABLE lineitem_tgt_agg > select alias.a0 as a0, > alias.a2 as a1, > alias.a1 as a2, > alias.a3 as a3, > alias.a4 as a4 > from ( > select alias.a0 as a0, > SUM(alias.a1) as a1, > SUM(alias.a2) as a2, > SUM(alias.a3) as a3, > SUM(alias.a4) as a4 > from ( > select lineitem_sf500.l_orderkey as a0, > CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * (1 - > lineitem_sf500.l_discount) * (1 + lineitem_sf500.l_tax) as double) as a1, > lineitem_sf500.l_quantity as a2, > CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * > lineitem_sf500.l_discount as double) as a3, > CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * > lineitem_sf500.l_tax as double) as a4 > from lineitem_sf500 > ) alias > group by alias.a0 > ) alias; > {code} > The above query was run with ~376GB of data / ~3billion records in the source. > It takes ~10 minutes with hive.map.aggr=false. > With map side aggregation set to true, the map tasks don't complete even > after an hour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)