[ https://issues.apache.org/jira/browse/HIVE-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148177#comment-15148177 ]
JoneZhang commented on HIVE-12650: ---------------------------------- Hi all, I'm sorrry reply you so late. Yes hive.spark.client.server.connect.timeout and spark.yarn.am.waitTime does not have any relations. hive.spark.client.server.connect.timeout is the timeout between RPC server and client handshake.When no container is available, hive cient will exit after hive.spark.client.server.connect.timeout. spark.yarn.am.waitTime is the time the Spark AM waits for the SparkContext to be created after the AM has been launched. There are two types of error log 1.Client closed before SASL negotiation finished was happened on resubmitted. See https://issues.apache.org/jira/browse/HIVE-12649. 2.Connection refused: /hiveclientip:port was happend when am tries to connect back to Hive. Container: container_1448873753366_113453_01_000001 on 10.247.169.134_8041 ============================================================================ LogType: stderr LogLength: 3302 Log Contents: Please use CMSClassUnloadingEnabled in place of CMSPermGenSweepingEnabled in the future Please use CMSClassUnloadingEnabled in place of CMSPermGenSweepingEnabled in the future 15/12/09 02:11:48 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 15/12/09 02:11:48 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1448873753366_113453_000001 15/12/09 02:11:49 INFO spark.SecurityManager: Changing view acls to: mqq 15/12/09 02:11:49 INFO spark.SecurityManager: Changing modify acls to: mqq 15/12/09 02:11:49 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mqq); users with modify permissions: Set(mqq) 15/12/09 02:11:49 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread 15/12/09 02:11:49 INFO yarn.ApplicationMaster: Waiting for spark context initialization 15/12/09 02:11:49 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 15/12/09 02:11:49 INFO client.RemoteDriver: Connecting to: 10.179.12.140:58013 15/12/09 02:11:49 ERROR yarn.ApplicationMaster: User class threw exception: java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused: /10.179.12.140:58013 java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused: /10.179.12.140:58013 at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:156) at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:483) Caused by: java.net.ConnectException: Connection refused: /10.179.12.140:58013 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:745) 15/12/09 02:11:49 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused: /10.179.12.140:58013) 15/12/09 02:11:59 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 150000 ms. Please check earlier log output for errors. Failing the application. 15/12/09 02:11:59 INFO util.Utils: Shutdown hook called > Increase default value of hive.spark.client.server.connect.timeout to exceeds > spark.yarn.am.waitTime > ---------------------------------------------------------------------------------------------------- > > Key: HIVE-12650 > URL: https://issues.apache.org/jira/browse/HIVE-12650 > Project: Hive > Issue Type: Bug > Affects Versions: 1.1.1, 1.2.1 > Reporter: JoneZhang > Assignee: Xuefu Zhang > > I think hive.spark.client.server.connect.timeout should be set greater than > spark.yarn.am.waitTime. The default value for > spark.yarn.am.waitTime is 100s, and the default value for > hive.spark.client.server.connect.timeout is 90s, which is not good. We can > increase it to a larger value such as 120s. -- This message was sent by Atlassian JIRA (v6.3.4#6332)