[ https://issues.apache.org/jira/browse/HIVE-17837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358309#comment-16358309 ]
Hive QA commented on HIVE-17837: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12909705/HIVE-17837.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 12975 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver (batchId=50) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] (batchId=13) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[row__id] (batchId=79) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=172) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=167) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=171) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=161) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_opt_shuffle_serde] (batchId=180) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] (batchId=122) org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query1] (batchId=250) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=221) org.apache.hadoop.hive.metastore.client.TestTablesList.testListTableNamesByFilterNullDatabase[Embedded] (batchId=206) org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap (batchId=282) org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256) org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188) org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234) org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234) org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/9113/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/9113/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-9113/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 24 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12909705 - PreCommit-HIVE-Build > Explicitly check if the HoS Remote Driver has been lost in the > RemoteSparkJobMonitor > ------------------------------------------------------------------------------------- > > Key: HIVE-17837 > URL: https://issues.apache.org/jira/browse/HIVE-17837 > Project: Hive > Issue Type: Sub-task > Components: Hive > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Priority: Major > Attachments: HIVE-17837.1.patch, HIVE-17837.2.patch > > > Right now the {{RemoteSparkJobMonitor}} implicitly checks if the connection > to the Spark remote driver is active. It does this everytime it triggers an > invocation of the {{Rpc#call}} method (so any call to {{SparkClient#run}}). > There are scenarios where we have seen the {{RemoteSparkJobMonitor}} when the > connection to the driver dies, because the implicit call fails to be invoked > (see HIVE-15860). > It would be ideal if we made this call explicit, so we fail as soon as we > know that the connection to the driver has died. > The fix has the added benefit that it allows us to fail faster in the case > where the {{RemoteSparkJobMonitor}} is in the QUEUED / SENT state. If its > stuck in that state, it won't fail until it hits the monitor timeout (by > default 1 minute), even though we already know the connection has died. The > error message that is thrown is also a little imprecise, it says there could > be queue contention, even though we know the real reason is that the > connection was lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005)