[ 
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035824#comment-17035824
 ] 

Hive QA commented on HIVE-22098:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12977341/HIVE-22098.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20584/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20584/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20584/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2020-02-13 00:34:22.360
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-20584/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2020-02-13 00:34:22.362
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at fcfc71b HIVE-10362: Support Type check/conversion in dynamic 
partition column(Karen Coppage, reviewed by Vineet Garg, Zoltan Haindrich)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at fcfc71b HIVE-10362: Support Type check/conversion in dynamic 
partition column(Karen Coppage, reviewed by Vineet Garg, Zoltan Haindrich)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2020-02-13 00:34:23.508
+ rm -rf ../yetus_PreCommit-HIVE-Build-20584
+ mkdir ../yetus_PreCommit-HIVE-Build-20584
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-20584
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-20584/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Trying to apply the patch with -p0
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java: does 
not exist in index
Trying to apply the patch with -p1
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java:20
Falling back to three-way merge...
Applied patch to 
'ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java' with conflicts.
Going to apply patch with: git apply -p1
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java:20
Falling back to three-way merge...
Applied patch to 
'ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java' with conflicts.
U ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-20584
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12977341 - PreCommit-HIVE-Build

> Data loss occurs when multiple tables are join with different bucket_version
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-22098
>                 URL: https://issues.apache.org/jira/browse/HIVE-22098
>             Project: Hive
>          Issue Type: Bug
>          Components: Operators
>    Affects Versions: 3.1.0
>            Reporter: LuGuangMing
>            Assignee: LuGuangMing
>            Priority: Major
>         Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png, 
> join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc
>
>
> When different bucketVersion of tables do join and  reducers number greater 
> than 2, result is easy to lose data.
> *Scenario 1*: Three tables join. The temporary result data of table_a in the 
> first table and table_b in the second table joins result is recorded as 
> tmp_a_b, When it joins with the third table, the bucket_version=2 of the 
> table created by default after hive-3.0.0, temporary data tmp_a_b initialized 
> the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In 
> the init method, the hash algorithm of selecting join column is selected 
> according to bucketVersion. If bucketVersion = 2 and is not an acid 
> operation, it will acquired the new algorithm of hash. Otherwise, the old 
> algorithm of hash is acquired. Because of the inconsistency of the algorithm 
> of hash, the partition of data allocation caused are different. At stage of 
> Reducer, Data with the same key can not be paired resulting in data loss.
> *Scenario 2*: create two test tables, create table 
> table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES 
> ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string) 
> TBLPROPERTIES ('bucketing_version'='2');
> when use table_bucketversion_1 to join table_bucketversion_2, partial result 
> data will be loss due to bucketVerison is different.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to