Github user harishreedharan commented on the pull request:

    https://github.com/apache/spark/pull/7489#issuecomment-122580420
  
    I have not had issues in cluster mode either. I just ran it a couple days
    ago (see my PR for key tab login). That said, I ran it using a keytab not
    after a kinit, so I can't be 100% sure this is not an issue using kinit.
    This might be configure issue somewhere too. RM not running might be
    because your cluster is oversubscribed.
    
    On Saturday, July 18, 2015, bolkedebruin <[email protected]> wrote:
    
    > I just checked: with a correct configuration and without the patch no job
    > will run. They will enter "ACCEPTED" state but never "RUNNING", and on one
    > of the clusters it fails eventually on the other cluster it stays
    > (indefinitely?) in "ACCEPTED"
    >
    > '''
    > 15/07/18 20:17:50 INFO ApplicationMaster: Waiting for spark context
    > initialization ...
    > 15/07/18 20:17:51 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
    > message AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:51 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
    > AkkaMessage(ReviveOffers,false)
    > 15/07/18 20:17:51 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
    > message (0.522463 ms) AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:51 INFO ConfiguredRMFailoverProxyProvider: Failing over to
    > rm1
    > 15/07/18 20:17:51 DEBUG Client: The ping interval is 60000 ms.
    > 15/07/18 20:17:51 DEBUG Client: Connecting to
    > lxhnl002.ad.ing.net/10.111.114.16:8032
    > 15/07/18 20:17:51 DEBUG UserGroupInformation: PrivilegedAction as:sparkjob
    > (auth:SIMPLE)
    > 
from:org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717)
    > 15/07/18 20:17:51 DEBUG SaslRpcClient: Sending sasl message state:
    > NEGOTIATE
    >
    > 15/07/18 20:17:51 DEBUG SaslRpcClient: Received SASL message state:
    > NEGOTIATE
    > auths {
    > method: "TOKEN"
    > mechanism: "DIGEST-MD5"
    > protocol: ""
    > serverId: "default"
    > challenge:
    > 
"realm=\"default\",nonce=\"ye6LDFg3AwqSofprdI177N2MyYCQaQ5k8Fyn7ep3\",qop=\"auth\",charset=utf-8,algorithm=md5-sess"
    > }
    > auths {
    > method: "KERBEROS"
    > mechanism: "GSSAPI"
    > protocol: "rm"
    > serverId: "lxhnl002.ad.ing.net"
    > }
    >
    > 15/07/18 20:17:51 DEBUG SaslRpcClient: Get token info proto:interface
    > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB
    > 
info:org.apache.hadoop.yarn.security.client.ClientRMSecurityInfo$2@35745453
    > 15/07/18 20:17:51 DEBUG RMDelegationTokenSelector: Looking for a token
    > with service 10.111.114.16:8032
    > 15/07/18 20:17:51 DEBUG RMDelegationTokenSelector: Token kind is
    > YARN_AM_RM_TOKEN and the token's service name is
    > 15/07/18 20:17:51 DEBUG RMDelegationTokenSelector: Token kind is
    > HIVE_DELEGATION_TOKEN and the token's service name is
    > 15/07/18 20:17:51 DEBUG RMDelegationTokenSelector: Token kind is
    > TIMELINE_DELEGATION_TOKEN and the token's service name is
    > 10.111.114.16:8188
    > 15/07/18 20:17:51 DEBUG RMDelegationTokenSelector: Token kind is
    > HDFS_DELEGATION_TOKEN and the token's service name is 10.111.114.16:8020
    > 15/07/18 20:17:51 DEBUG RMDelegationTokenSelector: Token kind is
    > HDFS_DELEGATION_TOKEN and the token's service name is 10.111.114.17:8020
    > 15/07/18 20:17:51 DEBUG RMDelegationTokenSelector: Token kind is
    > HDFS_DELEGATION_TOKEN and the token's service name is ha-hdfs:hdpnlcb
    > 15/07/18 20:17:51 DEBUG UserGroupInformation: PrivilegedActionException
    > as:sparkjob (auth:SIMPLE)
    > cause:org.apache.hadoop.security.AccessControlException: Client cannot
    > authenticate via:[TOKEN, KERBEROS]
    > 15/07/18 20:17:51 DEBUG UserGroupInformation: PrivilegedAction as:sparkjob
    > (auth:SIMPLE)
    > 
from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643)
    > 15/07/18 20:17:51 WARN Client: Exception encountered while connecting to
    > the server : org.apache.hadoop.security.AccessControlException: Client
    > cannot authenticate via:[TOKEN, KERBEROS]
    > 15/07/18 20:17:51 DEBUG UserGroupInformation: PrivilegedActionException
    > as:sparkjob (auth:SIMPLE) cause:java.io.IOException:
    > org.apache.hadoop.security.AccessControlException: Client cannot
    > authenticate via:[TOKEN, KERBEROS]
    > 15/07/18 20:17:51 DEBUG Client: closing ipc connection to
    > lxhnl002.ad.ing.net/10.111.114.16:8032:
    > org.apache.hadoop.security.AccessControlException: Client cannot
    > authenticate via:[TOKEN, KERBEROS]
    > java.io.IOException: org.apache.hadoop.security.AccessControlException:
    > Client cannot authenticate via:[TOKEN, KERBEROS]
    > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:680)
    > at java.security.AccessController.doPrivileged(Native Method)
    > at javax.security.auth.Subject.doAs(Subject.java:415)
    > at
    > 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    > at
    > 
org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643)
    > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:730)
    > at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
    > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
    > at org.apache.hadoop.ipc.Client.call(Client.java:1438)
    > at org.apache.hadoop.ipc.Client.call(Client.java:1399)
    > at
    > 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
    > at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
    > at
    > 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
    > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    > at
    > 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    > at
    > 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    > at java.lang.reflect.Method.invoke(Method.java:606)
    > at
    > 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    > at
    > 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    > at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
    > at
    > 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
    > at
    > 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
    > at
    > 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:73)
    > at scala.Option.foreach(Option.scala:236)
    > at
    > 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend.getDriverLogUrls(YarnClusterSchedulerBackend.scala:73)
    > at
    > 
org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:2001)
    > at org.apache.spark.SparkContext.(SparkContext.scala:552)
    > at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61)
    > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    > at
    > 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    > at
    > 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    > at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
    > at py4j.Gateway.invoke(Gateway.java:214)
    > at
    > 
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    > at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    > at py4j.GatewayConnection.run(GatewayConnection.java:207)
    > at java.lang.Thread.run(Thread.java:745)
    > Caused by: org.apache.hadoop.security.AccessControlException: Client
    > cannot authenticate via:[TOKEN, KERBEROS]
    > at
    > 
org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172)
    > at
    > 
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396)
    > at
    > 
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:553)
    > at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:368)
    > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:722)
    > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:718)
    > at java.security.AccessController.doPrivileged(Native Method)
    > at javax.security.auth.Subject.doAs(Subject.java:415)
    > at
    > 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717)
    > ... 33 more
    > 15/07/18 20:17:51 DEBUG Client: IPC Client (1486116955) connection to
    > lxhnl002.ad.ing.net/10.111.114.16:8032 from sparkjob: closed
    > 15/07/18 20:17:51 INFO RetryInvocationHandler: Exception while invoking
    > getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm1
    > after 2 fail over attempts. Trying to fail over immediately.
    > java.io.IOException: Failed on local exception: java.io.IOException:
    > org.apache.hadoop.security.AccessControlException: Client cannot
    > authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "
    > lxhnl011.ad.ing.net/10.111.114.14"; destination host is: "
    > lxhnl002.ad.ing.net":8032;
    > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
    > at org.apache.hadoop.ipc.Client.call(Client.java:1472)
    > at org.apache.hadoop.ipc.Client.call(Client.java:1399)
    > at
    > 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
    > at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
    > at
    > 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
    > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    > at
    > 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    > at
    > 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    > at java.lang.reflect.Method.invoke(Method.java:606)
    > at
    > 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    > at
    > 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    > at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
    > at
    > 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
    > at
    > 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
    > at
    > 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:73)
    > at scala.Option.foreach(Option.scala:236)
    > at
    > 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend.getDriverLogUrls(YarnClusterSchedulerBackend.scala:73)
    > at
    > 
org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:2001)
    > at org.apache.spark.SparkContext.(SparkContext.scala:552)
    > at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61)
    > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    > at
    > 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    > at
    > 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    > at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
    > at py4j.Gateway.invoke(Gateway.java:214)
    > at
    > 
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    > at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    > at py4j.GatewayConnection.run(GatewayConnection.java:207)
    > at java.lang.Thread.run(Thread.java:745)
    > Caused by: java.io.IOException:
    > org.apache.hadoop.security.AccessControlException: Client cannot
    > authenticate via:[TOKEN, KERBEROS]
    > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:680)
    > at java.security.AccessController.doPrivileged(Native Method)
    > at javax.security.auth.Subject.doAs(Subject.java:415)
    > at
    > 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    > at
    > 
org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643)
    > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:730)
    > at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
    > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
    > at org.apache.hadoop.ipc.Client.call(Client.java:1438)
    > ... 30 more
    > Caused by: org.apache.hadoop.security.AccessControlException: Client
    > cannot authenticate via:[TOKEN, KERBEROS]
    > at
    > 
org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172)
    > at
    > 
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396)
    > at
    > 
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:553)
    > at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:368)
    > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:722)
    > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:718)
    > at java.security.AccessController.doPrivileged(Native Method)
    > at javax.security.auth.Subject.doAs(Subject.java:415)
    > at
    > 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717)
    > ... 33 more
    > 15/07/18 20:17:51 INFO ConfiguredRMFailoverProxyProvider: Failing over to
    > rm2
    > 15/07/18 20:17:51 DEBUG Client: The ping interval is 60000 ms.
    > 15/07/18 20:17:51 DEBUG Client: Connecting to
    > lxhnl013.ad.ing.net/10.111.114.17:8032
    > 15/07/18 20:17:51 DEBUG Client: closing ipc connection to
    > lxhnl013.ad.ing.net/10.111.114.17:8032: Connection refused
    > java.net.ConnectException: Connection refused
    > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    > at
    > 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
    > at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
    > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
    > at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
    > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
    > at org.apache.hadoop.ipc.Client.call(Client.java:1438)
    > at org.apache.hadoop.ipc.Client.call(Client.java:1399)
    > at
    > 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
    > at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
    > at
    > 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
    > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    > at
    > 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    > at
    > 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    > at java.lang.reflect.Method.invoke(Method.java:606)
    > at
    > 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    > at
    > 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    > at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
    > at
    > 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
    > at
    > 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
    > at
    > 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:73)
    > at scala.Option.foreach(Option.scala:236)
    > at
    > 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend.getDriverLogUrls(YarnClusterSchedulerBackend.scala:73)
    > at
    > 
org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:2001)
    > at org.apache.spark.SparkContext.(SparkContext.scala:552)
    > at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61)
    > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    > at
    > 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    > at
    > 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    > at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
    > at py4j.Gateway.invoke(Gateway.java:214)
    > at
    > 
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    > at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    > at py4j.GatewayConnection.run(GatewayConnection.java:207)
    > at java.lang.Thread.run(Thread.java:745)
    > 15/07/18 20:17:51 DEBUG Client: IPC Client (1486116955) connection to
    > lxhnl013.ad.ing.net/10.111.114.17:8032 from sparkjob: closed
    > 15/07/18 20:17:51 INFO RetryInvocationHandler: Exception while invoking
    > getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2
    > after 3 fail over attempts. Trying to fail over after sleeping for 
18437ms.
    > java.net.ConnectException: Call From lxhnl011.ad.ing.net/10.111.114.14 to
    > lxhnl013.ad.ing.net:8032 failed on connection exception:
    > java.net.ConnectException: Connection refused; For more details see:
    > http://wiki.apache.org/hadoop/ConnectionRefused
    > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    > at
    > 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    > at
    > 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    > at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
    > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
    > at org.apache.hadoop.ipc.Client.call(Client.java:1472)
    > at org.apache.hadoop.ipc.Client.call(Client.java:1399)
    > at
    > 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
    > at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
    > at
    > 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
    > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    > at
    > 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    > at
    > 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    > at java.lang.reflect.Method.invoke(Method.java:606)
    > at
    > 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    > at
    > 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    > at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
    > at
    > 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
    > at
    > 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
    > at
    > 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:73)
    > at scala.Option.foreach(Option.scala:236)
    > at
    > 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend.getDriverLogUrls(YarnClusterSchedulerBackend.scala:73)
    > at
    > 
org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:2001)
    > at org.apache.spark.SparkContext.(SparkContext.scala:552)
    > at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61)
    > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    > at
    > 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    > at
    > 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    > at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
    > at py4j.Gateway.invoke(Gateway.java:214)
    > at
    > 
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    > at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    > at py4j.GatewayConnection.run(GatewayConnection.java:207)
    > at java.lang.Thread.run(Thread.java:745)
    > Caused by: java.net.ConnectException: Connection refused
    > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    > at
    > 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
    > at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
    > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
    > at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
    > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
    > at org.apache.hadoop.ipc.Client.call(Client.java:1438)
    > ... 30 more
    > 15/07/18 20:17:52 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
    > message AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:52 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
    > AkkaMessage(ReviveOffers,false)
    > 15/07/18 20:17:52 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
    > message (0.579815 ms) AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:53 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
    > message AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:53 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
    > AkkaMessage(ReviveOffers,false)
    > 15/07/18 20:17:53 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
    > message (19.24127 ms) AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:54 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
    > message AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:54 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
    > AkkaMessage(ReviveOffers,false)
    > 15/07/18 20:17:54 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
    > message (0.511616 ms) AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:55 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
    > message AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:55 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
    > AkkaMessage(ReviveOffers,false)
    > 15/07/18 20:17:55 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
    > message (0.573317 ms) AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:56 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
    > message AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:56 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
    > AkkaMessage(ReviveOffers,false)
    > 15/07/18 20:17:56 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
    > message (0.489633 ms) AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:57 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
    > message AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:57 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
    > AkkaMessage(ReviveOffers,false)
    > 15/07/18 20:17:57 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
    > message (0.492177 ms) AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:58 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
    > message AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:58 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
    > AkkaMessage(ReviveOffers,false)
    > 15/07/18 20:17:58 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
    > message (19.204572 ms) AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:59 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
    > message AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:17:59 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
    > AkkaMessage(ReviveOffers,false)
    > 15/07/18 20:17:59 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
    > message (0.544485 ms) AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:18:00 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
    > message AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:18:00 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
    > AkkaMessage(ReviveOffers,false)
    > 15/07/18 20:18:00 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
    > message (0.541704 ms) AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:18:00 INFO ApplicationMaster: Waiting for spark context
    > initialization ...
    > 15/07/18 20:18:01 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
    > message AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:18:01 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
    > AkkaMessage(ReviveOffers,false)
    > 15/07/18 20:18:01 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
    > message (0.57021 ms) AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:18:02 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
    > message AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:18:02 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
    > AkkaMessage(ReviveOffers,false)
    > 15/07/18 20:18:02 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
    > message (0.49115 ms) AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:18:03 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
    > message AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:18:03 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
    > AkkaMessage(ReviveOffers,false)
    > 15/07/18 20:18:03 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
    > message (37.549735 ms) AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:18:04 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
    > message AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:18:04 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
    > AkkaMessage(ReviveOffers,false)
    > 15/07/18 20:18:04 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
    > message (0.57814 ms) AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:18:05 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
    > message AkkaMessage(ReviveOffers,false) from
    > Actor[akka://sparkDriver/deadLetters]
    > 15/07/18 20:18:05 DEBUG
    > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
    > AkkaMessage(ReviveOffers,false)
    >
    > '''
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/7489#issuecomment-122576810>.
    >
    
    
    -- 
    
    Thanks,
    Hari



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to