[ 
https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247911#comment-14247911
 ] 

Chengxiang Li commented on HIVE-9094:
-------------------------------------

Here is the TimeoutException in spark client side:
{noformat}
2014-12-15 12:14:09,062 ERROR [main]: ql.Driver 
(SessionState.java:printError(838)) - FAILED: SemanticException Failed to get 
spark memory/core info: java.util.concurrent.TimeoutException
org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark 
memory/core info: java.util.concurrent.TimeoutException
        at 
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120)
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
        at 
org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79)
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
        at 
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134)
        at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
        at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
        at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
        at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
        at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
        at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837)
        at 
org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234)
        at 
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16(TestSparkCliDriver.java:166)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at junit.framework.TestCase.runTest(TestCase.java:176)
        at junit.framework.TestCase.runBare(TestCase.java:141)
        at junit.framework.TestResult$1.protect(TestResult.java:122)
        at junit.framework.TestResult.runProtected(TestResult.java:142)
        at junit.framework.TestResult.run(TestResult.java:125)
        at junit.framework.TestCase.run(TestCase.java:129)
        at junit.framework.TestSuite.runTest(TestSuite.java:255)
        at junit.framework.TestSuite.run(TestSuite.java:250)
        at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
        at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
        at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
        at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
Caused by: java.util.concurrent.TimeoutException
        at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
        at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74)
        at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35)
        at 
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.getExecutorCount(RemoteHiveSparkClient.java:92)
        at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getMemoryAndCores(SparkSessionImpl.java:77)
        at 
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:118)
        ... 43 more
{noformat}
And I found the following log in spark.log 5 seconds before timeout in client 
side:
{noformat}
2014-12-15 12:14:04,159 INFO  [Driver-RPC-Handler-0]: client.RemoteDriver 
(RemoteDriver.java:handle(255)) - Received job request 
13ca3a2b-0719-42fb-8201-0a645cea992f
2014-12-15 12:14:04,166 INFO  [Driver-RPC-Handler-0]: client.RemoteDriver 
(RemoteDriver.java:submit(179)) - SparkContext not yet up, queueing job request.
{noformat}
As I set timeout to 5s at RemoteHiveSparkClient::getExecutorCount, so the root 
cause here should be RemoteHiveSparkClient::getExecutorCount timeout after 5s 
as Spark cluster has not launched yet. I carelessly ignore the spark cluster 
launch time previously,
I think we should make the timeout value configurable, and set the default 
timeout value to more reasonable size, which includes spark cluster launch 
time. If it take more time to launch spark cluster, it make sense to failed 
current query, and user should check why it happens. 60s should be enough as a 
default value, [~vanzin], what do you think?

> TimeoutException when trying get executor count from RSC [Spark Branch]
> -----------------------------------------------------------------------
>
>                 Key: HIVE-9094
>                 URL: https://issues.apache.org/jira/browse/HIVE-9094
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>
> In 
> http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport,
>  join25.q failed because:
> {code}
> 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver 
> (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get 
> spark memory/core info: java.util.concurrent.TimeoutException
> org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark 
> memory/core info: java.util.concurrent.TimeoutException
>         at 
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
>         at 
> org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
>         at 
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134)
>         at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
>         at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
>         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
>         at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837)
>         at 
> org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234)
>         at 
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at junit.framework.TestCase.runTest(TestCase.java:176)
>         at junit.framework.TestCase.runBare(TestCase.java:141)
>         at junit.framework.TestResult$1.protect(TestResult.java:122)
>         at junit.framework.TestResult.runProtected(TestResult.java:142)
>         at junit.framework.TestResult.run(TestResult.java:125)
>         at junit.framework.TestCase.run(TestCase.java:129)
>         at junit.framework.TestSuite.runTest(TestSuite.java:255)
>         at junit.framework.TestSuite.run(TestSuite.java:250)
>         at 
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> Caused by: java.util.concurrent.TimeoutException
>         at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
>         at 
> org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74)
>         at 
> org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.getExecutorCount(RemoteHiveSparkClient.java:92)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getMemoryAndCores(SparkSessionImpl.java:77)
>         at 
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:118)
>         ... 43 more
> {code}
> The timeout is introduced in HIVE-9079. Previously the driver may hang. This 
> seems to be a robustness issue of RSC. Hanging isn't good, but timeout isn't 
> good either unless there is some network issue, which doesn't seem to be case 
> here. [~chengxiang li]/[~vanzin], could we get the bottom of this? Increase 
> the timeout value if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to