[ https://issues.apache.org/jira/browse/HIVE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115037#comment-14115037 ]
Hive QA commented on HIVE-7803: ------------------------------- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665172/HIVE-7803.2.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6127 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/554/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/554/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-554/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12665172 > Enable Hadoop speculative execution may cause corrupt output directory > (dynamic partition) > ------------------------------------------------------------------------------------------ > > Key: HIVE-7803 > URL: https://issues.apache.org/jira/browse/HIVE-7803 > Project: Hive > Issue Type: Bug > Components: HCatalog > Affects Versions: 0.13.1 > Environment: > Reporter: Selina Zhang > Assignee: Selina Zhang > Priority: Critical > Attachments: HIVE-7803.1.patch, HIVE-7803.2.patch > > > One of our users reports they see intermittent failures due to attempt > directories in the input paths. We found with speculative execution turned > on, two mappers tried to commit task at the same time using the same > committed task path, which cause the corrupt output directory. > The original Pig script: > {code} > STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME' > USING org.apache.hcatalog.pig.HCatStorer(); > {code} > Two mappers > attempt_1405021984947_5394024_m_000523_0: KILLED > attempt_1405021984947_5394024_m_000523_1: SUCCEEDED > attempt_1405021984947_5394024_m_000523_0 was killed right after the commit. > As a result, it created corrupt directory as > > /projects/.../tablename/_DYN0.7192688458252056/load_time=201408050000/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/ > containing > part-m-00523 (from attempt_1405021984947_5394024_m_000523_0) > and > attempt_1405021984947_5394024_m_000523_1/part-m-00523 > Namenode Audit log > ========================== > 1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 > cmd=create > src=/projects/.../tablename/_DYN0.7192688458252056/load_time=201408050000/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523 > dst=null perm=user:group:rw-r----- > 2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2 > cmd=create > src=/projects/.../tablename/_DYN0.7192688458252056/load_time=201408050000/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523 > dst=null perm=user:group:rw-r----- > 3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 > cmd=rename > src=/projects/.../tablename/_DYN0.7192688458252056/load_time=201408050000/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0 > dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=201408050000/type=complete/_temporary/1/task_1405021984947_5394024_m_000523 > perm=user:group:rwxr-x--- > 4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2 > cmd=rename > src=/projects/.../tablename/_DYN0.7192688458252056/load_time=201408050000/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1 > dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=201408050000/type=complete/_temporary/1/task_1405021984947_5394024_m_000523 > perm=user:group:rwxr-x--- > After consulting our Hadoop core team, we was pointed out some HCat code does > not participating in the two-phase commit protocol, for example in > FileRecordWriterContainer.close(): > {code} > for (Map.Entry<String, org.apache.hadoop.mapred.OutputCommitter> > entry : baseDynamicCommitters.entrySet()) { > org.apache.hadoop.mapred.TaskAttemptContext currContext = > dynamicContexts.get(entry.getKey()); > OutputCommitter baseOutputCommitter = entry.getValue(); > if (baseOutputCommitter.needsTaskCommit(currContext)) { > baseOutputCommitter.commitTask(currContext); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)