[jira] [Commented] (HIVE-9025) join38.q (without map join) produces incorrect result when testing with multiple reducers

Hive QA (JIRA) Sun, 07 Dec 2014 03:04:34 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237113#comment-14237113
 ]


Hive QA commented on HIVE-9025:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12685602/HIVE-9025.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 6696 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_join_nullsafe
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1987/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1987/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1987/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12685602 - PreCommit-HIVE-TRUNK-Build

> join38.q (without map join) produces incorrect result when testing with 
> multiple reducers
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-9025
>                 URL: https://issues.apache.org/jira/browse/HIVE-9025
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Chao
>            Assignee: Ted Xu
>            Priority: Blocker
>         Attachments: HIVE-9025.patch
>
>
> I have this query from a modified version of {{join38.q}}, which does NOT use 
> map join:
> {code}
> FROM src a JOIN tmp b ON (a.key = b.col11)
> SELECT a.value, b.col5, count(1) as count
> where b.col11 = 111
> group by a.value, b.col5;
> {code}
> If I set {{mapred.reduce.tasks}} to 1, the result is correct. But, if I set 
> it to be a larger number (3 for instance), then result will be 
> {noformat}
> val_111       105     1
> {noformat}
> which is wrong.
> I think the issue is that, for this case, ConstantPropagationProcFactory will 
> overwrite the partition cols for the reduce sink desc, with an empty list. 
> Then, later on in ReduceSinkOperator#computeHashCode, since partitionEval is 
> length 0, it will use an random number as hashcode, for each separate row. As 
> result, rows with same key will be distributed to different reducers, and 
> hence leads to incorrect result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9025) join38.q (without map join) produces incorrect result when testing with multiple reducers

Reply via email to