[ https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035824#comment-17035824 ]
Hive QA commented on HIVE-22098: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12977341/HIVE-22098.1.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20584/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20584/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20584/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2020-02-13 00:34:22.360 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-20584/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2020-02-13 00:34:22.362 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at fcfc71b HIVE-10362: Support Type check/conversion in dynamic partition column(Karen Coppage, reviewed by Vineet Garg, Zoltan Haindrich) + git clean -f -d Removing standalone-metastore/metastore-server/src/gen/ + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at fcfc71b HIVE-10362: Support Type check/conversion in dynamic partition column(Karen Coppage, reviewed by Vineet Garg, Zoltan Haindrich) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2020-02-13 00:34:23.508 + rm -rf ../yetus_PreCommit-HIVE-Build-20584 + mkdir ../yetus_PreCommit-HIVE-Build-20584 + git gc + cp -R . ../yetus_PreCommit-HIVE-Build-20584 + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-20584/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch Trying to apply the patch with -p0 error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java: does not exist in index Trying to apply the patch with -p1 error: patch failed: ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java:20 Falling back to three-way merge... Applied patch to 'ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java' with conflicts. Going to apply patch with: git apply -p1 error: patch failed: ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java:20 Falling back to three-way merge... Applied patch to 'ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java' with conflicts. U ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java + result=1 + '[' 1 -ne 0 ']' + rm -rf yetus_PreCommit-HIVE-Build-20584 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12977341 - PreCommit-HIVE-Build > Data loss occurs when multiple tables are join with different bucket_version > ---------------------------------------------------------------------------- > > Key: HIVE-22098 > URL: https://issues.apache.org/jira/browse/HIVE-22098 > Project: Hive > Issue Type: Bug > Components: Operators > Affects Versions: 3.1.0 > Reporter: LuGuangMing > Assignee: LuGuangMing > Priority: Major > Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png, > join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc > > > When different bucketVersion of tables do join and reducers number greater > than 2, result is easy to lose data. > *Scenario 1*: Three tables join. The temporary result data of table_a in the > first table and table_b in the second table joins result is recorded as > tmp_a_b, When it joins with the third table, the bucket_version=2 of the > table created by default after hive-3.0.0, temporary data tmp_a_b initialized > the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In > the init method, the hash algorithm of selecting join column is selected > according to bucketVersion. If bucketVersion = 2 and is not an acid > operation, it will acquired the new algorithm of hash. Otherwise, the old > algorithm of hash is acquired. Because of the inconsistency of the algorithm > of hash, the partition of data allocation caused are different. At stage of > Reducer, Data with the same key can not be paired resulting in data loss. > *Scenario 2*: create two test tables, create table > table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES > ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string) > TBLPROPERTIES ('bucketing_version'='2'); > when use table_bucketversion_1 to join table_bucketversion_2, partial result > data will be loss due to bucketVerison is different. > -- This message was sent by Atlassian Jira (v8.3.4#803005)