Hi Akhil, the namenode is definitely configured correctly, otherwise the job would not start at all. It registers with YARN and starts up, but once the nodes try to communicate to each other it fails. Note that a hadoop MR job using the identical configuration executes without any problems. The driver also connects just fine -- here is the log:
15/07/24 08:10:58 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 15/07/24 08:10:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/07/24 08:10:59 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1437724871597_0001_000001 15/07/24 08:11:00 INFO spark.SecurityManager: Changing view acls to: root,rok 15/07/24 08:11:00 INFO spark.SecurityManager: Changing modify acls to: root,rok 15/07/24 08:11:00 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, rok); users with modify permissions: Set(root, rok) 15/07/24 08:11:00 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/07/24 08:11:01 INFO Remoting: Starting remoting 15/07/24 08:11:01 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:51896] 15/07/24 08:11:01 INFO util.Utils: Successfully started service 'sparkYarnAM' on port 51896. 15/07/24 08:11:01 INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable. 15/07/24 08:11:01 INFO yarn.ApplicationMaster: Driver now available: <driver IP>:58734 15/07/24 08:11:01 INFO yarn.ApplicationMaster$AMEndpoint: Add WebUI Filter. AddWebUIFilter(org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,Map(PROXY_HOSTS -> node1, PROXY_URI_BASES -> http://node1:8089/proxy/application_1437724871597_0001),/proxy/application_1437724871597_0001) 15/07/24 08:11:01 INFO client.RMProxy: Connecting to ResourceManager at node1/10.211.55.101:8030 15/07/24 08:11:01 INFO yarn.YarnRMClient: Registering the ApplicationMaster 15/07/24 08:11:02 INFO ipc.Client: Retrying connect to server: node1/10.211.55.101:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/07/24 08:11:03 INFO ipc.Client: Retrying connect to server: node1/10.211.55.101:8030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/07/24 08:11:04 INFO ipc.Client: Retrying connect to server: node1/10.211.55.101:8030. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/07/24 08:11:05 INFO ipc.Client: Retrying connect to server: node1/10.211.55.101:8030. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/07/24 08:11:06 INFO ipc.Client: Retrying connect to server: node1/10.211.55.101:8030. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/07/24 08:11:07 INFO ipc.Client: Retrying connect to server: node1/10.211.55.101:8030. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/07/24 08:11:08 INFO ipc.Client: Retrying connect to server: node1/10.211.55.101:8030. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/07/24 08:11:09 INFO ipc.Client: Retrying connect to server: node1/10.211.55.101:8030. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/07/24 08:11:10 INFO ipc.Client: Retrying connect to server: node1/10.211.55.101:8030. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/07/24 08:11:11 INFO ipc.Client: Retrying connect to server: node1/10.211.55.101:8030. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/07/24 08:11:11 ERROR yarn.ApplicationMaster: Uncaught exception: java.io.IOException: Failed on local exception: java.net.SocketException: Connection refused; Host Details : local host is: "node4/10.211.55.104"; destination host is: "node1":8030; On Thu, Jul 23, 2015 at 7:00 PM, Akhil Das <[email protected]> wrote: > It looks like its picking up the wrong namenode uri from the > HADOOP_CONF_DIR, make sure it is proper. Also for submitting a spark job to > a remote cluster, you might want to look at the spark.driver host and > spark.driver.port > > Thanks > Best Regards > > On Wed, Jul 22, 2015 at 8:56 PM, rok <[email protected]> wrote: > >> I am trying to run Spark applications with the driver running locally and >> interacting with a firewalled remote cluster via a SOCKS proxy. >> >> I have to modify the hadoop configuration on the *local machine* to try to >> make this work, adding >> >> <property> >> <name>hadoop.rpc.socket.factory.class.default</name> >> <value>org.apache.hadoop.net.SocksSocketFactory</value> >> </property> >> <property> >> <name>hadoop.socks.server</name> >> <value>localhost:9998</value> >> </property> >> >> and on the *remote cluster* side >> >> <property> >> <name>hadoop.rpc.socket.factory.class.default</name> >> <value>org.apache.hadoop.net.StandardSocketFactory</value> >> <final>true</final> >> </property> >> >> With this setup, and running "ssh -D 9998 gateway.host" to start the proxy >> connection, MapReduce jobs started on the local machine execute fine on >> the >> remote cluster. However, trying to launch a Spark job fails with the nodes >> of the cluster apparently unable to communicate with one another: >> >> java.io.IOException: Failed on local exception: java.net.SocketException: >> Connection refused; Host Details : local host is: "node3/10.211.55.103"; >> destination host is: "node1":8030; >> >> Looking at the packets being sent to node1 from node3, it's clear that no >> requests are made on port 8030, hinting that the connection is somehow >> being >> proxied. >> >> Is it possible that the Spark job is not honoring the socket.factory >> settings on the *cluster* side for some reason? >> >> Note that Spark JIRA 5004 >> <https://issues.apache.org/jira/browse/SPARK-5004> addresses a similar >> problem, though it looks like they are actually not the same (since in >> that >> case it sounds like a standalone cluster is being used). >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/problems-running-Spark-on-a-firewalled-remote-YARN-cluster-via-SOCKS-proxy-tp23955.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> >
