[jira] [Commented] (HIVE-16498) [Tez] ReduceRecordProcessor has no check to see if all the operators are done or not and is reading complete data

Hive QA (JIRA) Wed, 10 May 2017 07:51:31 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16004812#comment-16004812
 ]


Hive QA commented on HIVE-16498:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12867311/HIVE-16498.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10661 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=144)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=238)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5166/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5166/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5166/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12867311 - PreCommit-HIVE-Build

> [Tez] ReduceRecordProcessor has no check to see if all the operators are done 
> or not and is reading complete data
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16498
>                 URL: https://issues.apache.org/jira/browse/HIVE-16498
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.3.0
>            Reporter: Adesh Kumar Rao
>             Fix For: 1.2.0, 1.3.0
>
>         Attachments: HIVE-16498.1.patch
>
>
> ReducerRecordProcessor is not checking if the reducer (Operator) is done or 
> not and this causes reading of useless data.
> It can be reproduced by a reduce side join.
> The data for large_table is generated by following shell script and a table 
> can be created from the file `large.txt`
> {code:java}
> for (( j=1 ; j <=20; j++))
> do
>   for (( i=1; i <= 1000000; i++ ))
>   do
>     echo "$i,$j" >> large.txt
>   done
> done
> {code}
> {code:java}
> create external table large_table ( i int, j int) row format delimited fields 
> terminated by ',' location "hdfs://<some-hdfs-location>";
> set hive.auto.convert.join=false; -- So that reduce side join is used instead 
> of MapJoin
> select * from large_table a join large_table b on a,j = b.j limit 100;
> {code}
> The above join query is stuck reading all the data from table (because of no 
> check) and does not seem to finish in real time as compared to MR or even Tez 
> with MapJoin enabled.
> For reference, the same query takes around 5-6 minutes on MR and 2-3 minutes 
> in case of MapJoin on Tez.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16498) [Tez] ReduceRecordProcessor has no check to see if all the operators are done or not and is reading complete data

Reply via email to