[ https://issues.apache.org/jira/browse/HIVE-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843108#comment-15843108 ]
Hive QA commented on HIVE-15722: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12849630/HIVE-15722.04.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11003 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple] (batchId=147) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=93) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3217/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3217/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3217/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12849630 - PreCommit-HIVE-Build > LLAP: Avoid marking a query as complete if the AMReporter runs into an error > ---------------------------------------------------------------------------- > > Key: HIVE-15722 > URL: https://issues.apache.org/jira/browse/HIVE-15722 > Project: Hive > Issue Type: Bug > Reporter: Siddharth Seth > Assignee: Siddharth Seth > Attachments: HIVE-15722.01.patch, HIVE-15722.02.patch, > HIVE-15722.03.patch, HIVE-15722.04.patch > > > When the AMReporter runs into an error (typically intermittent), we end up > killing all fragments on the daemon. This is done by marking the query as > complete. > The AM would continue to try scheduling on this node - which would lead to > task failures if the daemon structures are updated. > Instead of clearing the structures, it's better to kill the fragments, and > let a queryComplete call come in from the AM. > Later, we could make enhancements in the AM to avoid such nodes. That's not > simple though, since the AM will not find out what happened due to the > communication failure from the daemon. > Leads to > {code} > org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): Dag > query16 already complete. Rejecting fragment [Map 7, 29, 0] > at > org.apache.hadoop.hive.llap.daemon.impl.QueryTracker.registerFragment(QueryTracker.java:149) > at > org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:226) > at > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:487) > at > org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.submitWork(LlapProtocolServerImpl.java:101) > at > org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:16728) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)