[ https://issues.apache.org/jira/browse/HIVE-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898962#comment-15898962 ]
Rui Li commented on HIVE-16071: ------------------------------- Hi [~xuefuz] and [~ctang.ma], I'd prefer removing the cancelTask. The main reason is we already have a [timeout task|https://github.com/apache/hive/blob/master/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java#L170] in {{RpcServer::registerClient}}, which takes {{hive.spark.client.server.connect.timeout}}. And this timeout task covers both connection establishment and SASL handshake - it's canceled [when the SaslServerHandler completes|https://github.com/apache/hive/blob/master/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java#L313]. Therefore I don't think the cancelTask is needed. On the other hand, if we remove it, we'll have one timeout task on server and client side respectively. Each such task takes care of timeout in connection establishment and SASL handshake, with different default value because server needs to wait longer for client to start. Thus it's more consistent and cleaner. Thoughts? > Spark remote driver misuses the timeout in RPC handshake > -------------------------------------------------------- > > Key: HIVE-16071 > URL: https://issues.apache.org/jira/browse/HIVE-16071 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Chaoyu Tang > Assignee: Chaoyu Tang > Attachments: HIVE-16071.patch > > > Based on its property description in HiveConf and the comments in HIVE-12650 > (https://issues.apache.org/jira/browse/HIVE-12650?focusedCommentId=15128979&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15128979), > hive.spark.client.connect.timeout is the timeout when the spark remote > driver makes a socket connection (channel) to RPC server. But currently it is > also used by the remote driver for RPC client/server handshaking, which is > not right. Instead, hive.spark.client.server.connect.timeout should be used > and it has already been used by the RPCServer in the handshaking. > The error like following is usually caused by this issue, since the default > hive.spark.client.connect.timeout value (1000ms) used by remote driver for > handshaking is a little too short. > {code} > 17/02/20 08:46:08 ERROR yarn.ApplicationMaster: User class threw exception: > java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: > Client closed before SASL negotiation finished. > java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: > Client closed before SASL negotiation finished. > at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) > at > org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:156) > at > org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542) > Caused by: javax.security.sasl.SaslException: Client closed before SASL > negotiation finished. > at > org.apache.hive.spark.client.rpc.Rpc$SaslClientHandler.dispose(Rpc.java:453) > at > org.apache.hive.spark.client.rpc.SaslHandler.channelInactive(SaslHandler.java:90) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)