[ https://issues.apache.org/jira/browse/HIVE-22579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988399#comment-16988399 ]
Hive QA commented on HIVE-22579: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12987446/HIVE-22579.01.branch-2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/19746/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19746/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19746/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2019-12-05 02:34:47.581 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-19746/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z branch-2 ]] + [[ -d apache-github-branch-2-source ]] + [[ ! -d apache-github-branch-2-source/.git ]] + [[ ! -d apache-github-branch-2-source ]] + date '+%Y-%m-%d %T.%3N' 2019-12-05 02:34:47.679 + cd apache-github-branch-2-source + git fetch origin >From https://github.com/apache/hive 7534f82..de0a7ec branch-1 -> origin/branch-1 9bcdb54..6002c51 branch-1.2 -> origin/branch-1.2 292a98f..0359921 branch-2.1 -> origin/branch-2.1 b148507..67f9139 branch-2.2 -> origin/branch-2.2 f90975a..f55ee60 branch-3 -> origin/branch-3 a354bed..0ecbd12 branch-3.0 -> origin/branch-3.0 909c1dc..eb4d7c3 branch-3.1 -> origin/branch-3.1 305e710..1ef05ef master -> origin/master e59fdf9..3638231 storage-branch-2.7 -> origin/storage-branch-2.7 * [new tag] rel/storage-release-2.7.1 -> rel/storage-release-2.7.1 + git reset --hard HEAD HEAD is now at a4a6101 HIVE-22249: Support Parquet through HCatalog (Jay Green-Stevens via Peter Vary) + git clean -f -d + git checkout branch-2 Already on 'branch-2' Your branch is up-to-date with 'origin/branch-2'. + git reset --hard origin/branch-2 HEAD is now at a4a6101 HIVE-22249: Support Parquet through HCatalog (Jay Green-Stevens via Peter Vary) + git merge --ff-only origin/branch-2 Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2019-12-05 02:35:11.141 + rm -rf ../yetus_PreCommit-HIVE-Build-19746 + mkdir ../yetus_PreCommit-HIVE-Build-19746 + git gc + cp -R . ../yetus_PreCommit-HIVE-Build-19746 + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-19746/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch Going to apply patch with: git apply -p0 + [[ maven == \m\a\v\e\n ]] + rm -rf /data/hiveptest/working/maven/org/apache/hive + mvn -B clean install -DskipTests -T 4 -q -Dmaven.repo.local=/data/hiveptest/working/maven ANTLR Parser Generator Version 3.5.2 Output file /data/hiveptest/working/apache-github-branch-2-source/metastore/target/generated-sources/antlr3/org/apache/hadoop/hive/metastore/parser/FilterParser.java does not exist: must build /data/hiveptest/working/apache-github-branch-2-source/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g org/apache/hadoop/hive/metastore/parser/Filter.g DataNucleus Enhancer (version 4.1.17) for API "JDO" DataNucleus Enhancer : Classpath >> /usr/share/maven/boot/plexus-classworlds-2.x.jar ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MOrder ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MColumnDescriptor ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MStringList ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MStorageDescriptor ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MPartition ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MIndex ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MRole ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MRoleMap ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MGlobalPrivilege ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDBPrivilege ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTablePrivilege ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MPartitionPrivilege ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTableColumnPrivilege ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MPartitionColumnPrivilege ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MPartitionEvent ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MMasterKey ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDelegationToken ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTableColumnStatistics ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MVersionTable ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MResourceUri ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFunction ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MNotificationLog ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MNotificationNextId DataNucleus Enhancer completed with success for 30 classes. Timings : input=186 ms, enhance=197 ms, total=383 ms. Consult the log for full details ANTLR Parser Generator Version 3.5.2 Output file /data/hiveptest/working/apache-github-branch-2-source/ql/target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveLexer.java does not exist: must build /data/hiveptest/working/apache-github-branch-2-source/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g org/apache/hadoop/hive/ql/parse/HiveLexer.g Output file /data/hiveptest/working/apache-github-branch-2-source/ql/target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParser.java does not exist: must build /data/hiveptest/working/apache-github-branch-2-source/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g org/apache/hadoop/hive/ql/parse/HiveParser.g Output file /data/hiveptest/working/apache-github-branch-2-source/ql/target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HintParser.java does not exist: must build /data/hiveptest/working/apache-github-branch-2-source/ql/src/java/org/apache/hadoop/hive/ql/parse/HintParser.g org/apache/hadoop/hive/ql/parse/HintParser.g Generating vector expression code Generating vector expression test code [ERROR] Failed to execute goal on project hive-hbase-handler: Could not resolve dependencies for project org.apache.hive:hive-hbase-handler:jar:2.4.0-SNAPSHOT: Could not find artifact org.apache.hbase:hbase-procedure:jar:1.1.1 -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :hive-hbase-handler + result=1 + '[' 1 -ne 0 ']' + rm -rf yetus_PreCommit-HIVE-Build-19746 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12987446 - PreCommit-HIVE-Build > ACID v1: covered delta-only splits (without base) should be marked as covered > (branch-2) > ---------------------------------------------------------------------------------------- > > Key: HIVE-22579 > URL: https://issues.apache.org/jira/browse/HIVE-22579 > Project: Hive > Issue Type: Bug > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > Attachments: HIVE-22579.01.branch-2.patch > > > There is a scenario when different SplitGenerator instances try to cover the > delta-only buckets (having no base file) more than once, so there could be > multiple OrcSplit instances generated for the same delta file, causing more > tasks to read the same delta file more than once, causing duplicate records > in a simple select star query. > File structure for a 256 bucket table > {code} > drwxrwxrwx - hive hadoop 0 2019-11-29 15:55 > /apps/hive/warehouse/naresh.db/test1/base_0000013 > -rw-r--r-- 3 hive hadoop 353 2019-11-29 15:55 > /apps/hive/warehouse/naresh.db/test1/base_0000013/bucket_00012 > -rw-r--r-- 3 hive hadoop 1642 2019-11-29 15:55 > /apps/hive/warehouse/naresh.db/test1/base_0000013/bucket_00140 > drwxrwxrwx - hive hadoop 0 2019-11-29 15:55 > /apps/hive/warehouse/naresh.db/test1/delta_0000014_0000014_0000 > -rwxrwxrwx 3 hive hadoop 348 2019-11-29 15:55 > /apps/hive/warehouse/naresh.db/test1/delta_0000014_0000014_0000/bucket_00012 > -rwxrwxrwx 3 hive hadoop 1635 2019-11-29 15:55 > /apps/hive/warehouse/naresh.db/test1/delta_0000014_0000014_0000/bucket_00140 > drwxrwxrwx - hive hadoop 0 2019-11-29 16:04 > /apps/hive/warehouse/naresh.db/test1/delta_0000015_0000015_0000 > -rwxrwxrwx 3 hive hadoop 348 2019-11-29 16:04 > /apps/hive/warehouse/naresh.db/test1/delta_0000015_0000015_0000/bucket_00012 > -rwxrwxrwx 3 hive hadoop 1808 2019-11-29 16:04 > /apps/hive/warehouse/naresh.db/test1/delta_0000015_0000015_0000/bucket_00140 > drwxrwxrwx - hive hadoop 0 2019-11-29 16:06 > /apps/hive/warehouse/naresh.db/test1/delta_0000016_0000016_0000 > -rwxrwxrwx 3 hive hadoop 348 2019-11-29 16:06 > /apps/hive/warehouse/naresh.db/test1/delta_0000016_0000016_0000/bucket_00043 > -rwxrwxrwx 3 hive hadoop 1633 2019-11-29 16:06 > /apps/hive/warehouse/naresh.db/test1/delta_0000016_0000016_0000/bucket_00171 > {code} > in this case, when bucket_00171 file has a record, and there is no base file > for that, a select (*) with ETL split strategy can generate 2 splits for the > same delta bucket... > the scenario of the issue: > 1. ETLSplitStrategy contains a [covered[] > array|https://github.com/apache/hive/blob/298f749ec7be04abb797fb119f3f0b923c8a1b27/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L763] > which is [shared between the SplitInfo > instances|https://github.com/apache/hive/blob/298f749ec7be04abb797fb119f3f0b923c8a1b27/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L824] > to be created > 2. a SplitInfo instance is created for [every base file (2 in this > case)|https://github.com/apache/hive/blob/298f749ec7be04abb797fb119f3f0b923c8a1b27/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L809] > 3. for every SplitInfo, [a SplitGenerator is > created|https://github.com/apache/hive/blob/298f749ec7be04abb797fb119f3f0b923c8a1b27/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L925-L926], > and in the constructor, [parent's getSplit is > called|https://github.com/apache/hive/blob/298f749ec7be04abb797fb119f3f0b923c8a1b27/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L1251], > which tries to take care of the deltas > I'm not sure at the moment what's the intention of this, but this way, > duplicated delta split can be generated, which can cause duplicated read > later (note that both tasks read the same delta file: bucket_00171) > {code} > 2019-12-01T16:24:53,669 INFO [TezTR-127843_16_30_0_171_0 > (1575040127843_0016_30_00_000171_0)] orc.ReaderImpl: Reading ORC rows from > hdfs://c3351-node2.squadron.support.hortonworks.com:8020/apps/hive/warehouse/naresh.db/test1/delta_0000016_0000016_0000/bucket_00171 > with {include: [true, true, true, true, true, true, true, true, true, true, > true, true], offset: 0, length: 9223372036854775807, schema: > struct<idp_warehouse_id:bigint,idp_audit_id:bigint,batch_id:decimal(9,0),source_system_cd:varchar(500),insert_time:timestamp,process_status_cd:varchar(20),business_date:date,last_update_time:timestamp,report_date:date,etl_run_time:timestamp,etl_run_nbr:bigint>} > 2019-12-01T16:24:53,672 INFO [TezTR-127843_16_30_0_171_0 > (1575040127843_0016_30_00_000171_0)] lib.MRReaderMapred: Processing split: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat:OrcSplit > [hdfs://c3351-node2.squadron.support.hortonworks.com:8020/apps/hive/warehouse/naresh.db/test1, > start=171, length=0, isOriginal=false, fileLength=9223372036854775807, > hasFooter=false, hasBase=false, deltas=[{ minTxnId: 14 maxTxnId: 14 stmtIds: > [0] }, { minTxnId: 15 maxTxnId: 15 stmtIds: [0] }, { minTxnId: 16 maxTxnId: > 16 stmtIds: [0] }]] > 2019-12-01T16:24:55,807 INFO [TezTR-127843_16_30_0_425_0 > (1575040127843_0016_30_00_000425_0)] orc.ReaderImpl: Reading ORC rows from > hdfs://c3351-node2.squadron.support.hortonworks.com:8020/apps/hive/warehouse/naresh.db/test1/delta_0000016_0000016_0000/bucket_00171 > with {include: [true, true, true, true, true, true, true, true, true, true, > true, true], offset: 0, length: 9223372036854775807, schema: > struct<idp_warehouse_id:bigint,idp_audit_id:bigint,batch_id:decimal(9,0),source_system_cd:varchar(500),insert_time:timestamp,process_status_cd:varchar(20),business_date:date,last_update_time:timestamp,report_date:date,etl_run_time:timestamp,etl_run_nbr:bigint>} > 2019-12-01T16:24:55,813 INFO [TezTR-127843_16_30_0_425_0 > (1575040127843_0016_30_00_000425_0)] lib.MRReaderMapred: Processing split: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat:OrcSplit > [hdfs://c3351-node2.squadron.support.hortonworks.com:8020/apps/hive/warehouse/naresh.db/test1, > start=171, length=0, isOriginal=false, fileLength=9223372036854775807, > hasFooter=false, hasBase=false, deltas=[{ minTxnId: 14 maxTxnId: 14 stmtIds: > [0] }, { minTxnId: 15 maxTxnId: 15 stmtIds: [0] }, { minTxnId: 16 maxTxnId: > 16 stmtIds: [0] }]] > {code} > seems like this issue doesn't affect AcidV2, as getSplits() returns an empty > collection or throws an exception in case of unexpected deltas (which was the > case here, where deltas was not unexpected): > https://github.com/apache/hive/blob/8ee3497f87f81fa84ee1023e891dc54087c2cd5e/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L1178-L1197 -- This message was sent by Atlassian Jira (v8.3.4#803005)