Re: problems running Spark on a firewalled remote YARN cluster via SOCKS proxy

Rok Roskar Fri, 24 Jul 2015 02:23:13 -0700

Hi Akhil,

the namenode is definitely configured correctly, otherwise the job would
not start at all. It registers with YARN and starts up, but once the nodes
try to communicate to each other it fails. Note that a hadoop MR job using
the identical configuration executes without any problems. The driver also
connects just fine -- here is the log:


15/07/24 08:10:58 INFO yarn.ApplicationMaster: Registered signal
handlers for [TERM, HUP, INT]
15/07/24 08:10:59 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
15/07/24 08:10:59 INFO yarn.ApplicationMaster: ApplicationAttemptId:
appattempt_1437724871597_0001_000001
15/07/24 08:11:00 INFO spark.SecurityManager: Changing view acls to: root,rok
15/07/24 08:11:00 INFO spark.SecurityManager: Changing modify acls to: root,rok
15/07/24 08:11:00 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(root, rok); users with modify permissions: Set(root,
rok)
15/07/24 08:11:00 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/07/24 08:11:01 INFO Remoting: Starting remoting
15/07/24 08:11:01 INFO Remoting: Remoting started; listening on
addresses :[akka.tcp://[email protected]:51896]
15/07/24 08:11:01 INFO util.Utils: Successfully started service
'sparkYarnAM' on port 51896.
15/07/24 08:11:01 INFO yarn.ApplicationMaster: Waiting for Spark
driver to be reachable.
15/07/24 08:11:01 INFO yarn.ApplicationMaster: Driver now available:
<driver IP>:58734
15/07/24 08:11:01 INFO yarn.ApplicationMaster$AMEndpoint: Add WebUI
Filter. 
AddWebUIFilter(org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,Map(PROXY_HOSTS
-> node1, PROXY_URI_BASES ->
http://node1:8089/proxy/application_1437724871597_0001),/proxy/application_1437724871597_0001)
15/07/24 08:11:01 INFO client.RMProxy: Connecting to ResourceManager
at node1/10.211.55.101:8030
15/07/24 08:11:01 INFO yarn.YarnRMClient: Registering the ApplicationMaster
15/07/24 08:11:02 INFO ipc.Client: Retrying connect to server:
node1/10.211.55.101:8030. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)
15/07/24 08:11:03 INFO ipc.Client: Retrying connect to server:
node1/10.211.55.101:8030. Already tried 1 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)
15/07/24 08:11:04 INFO ipc.Client: Retrying connect to server:
node1/10.211.55.101:8030. Already tried 2 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)
15/07/24 08:11:05 INFO ipc.Client: Retrying connect to server:
node1/10.211.55.101:8030. Already tried 3 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)
15/07/24 08:11:06 INFO ipc.Client: Retrying connect to server:
node1/10.211.55.101:8030. Already tried 4 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)
15/07/24 08:11:07 INFO ipc.Client: Retrying connect to server:
node1/10.211.55.101:8030. Already tried 5 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)
15/07/24 08:11:08 INFO ipc.Client: Retrying connect to server:
node1/10.211.55.101:8030. Already tried 6 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)
15/07/24 08:11:09 INFO ipc.Client: Retrying connect to server:
node1/10.211.55.101:8030. Already tried 7 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)
15/07/24 08:11:10 INFO ipc.Client: Retrying connect to server:
node1/10.211.55.101:8030. Already tried 8 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)
15/07/24 08:11:11 INFO ipc.Client: Retrying connect to server:
node1/10.211.55.101:8030. Already tried 9 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)
15/07/24 08:11:11 ERROR yarn.ApplicationMaster: Uncaught exception:
java.io.IOException: Failed on local exception:
java.net.SocketException: Connection refused; Host Details : local
host is: "node4/10.211.55.104"; destination host is: "node1":8030;




On Thu, Jul 23, 2015 at 7:00 PM, Akhil Das <[email protected]>
wrote:

> It looks like its picking up the wrong namenode uri from the
> HADOOP_CONF_DIR, make sure it is proper. Also for submitting a spark job to
> a remote cluster, you might want to look at the spark.driver host and
> spark.driver.port
>
> Thanks
> Best Regards
>
> On Wed, Jul 22, 2015 at 8:56 PM, rok <[email protected]> wrote:
>
>> I am trying to run Spark applications with the driver running locally and
>> interacting with a firewalled remote cluster via a SOCKS proxy.
>>
>> I have to modify the hadoop configuration on the *local machine* to try to
>> make this work, adding
>>
>> <property>
>>    <name>hadoop.rpc.socket.factory.class.default</name>
>>    <value>org.apache.hadoop.net.SocksSocketFactory</value>
>> </property>
>> <property>
>>    <name>hadoop.socks.server</name>
>>    <value>localhost:9998</value>
>> </property>
>>
>> and on the *remote cluster* side
>>
>> <property>
>>     <name>hadoop.rpc.socket.factory.class.default</name>
>>     <value>org.apache.hadoop.net.StandardSocketFactory</value>
>>     <final>true</final>
>> </property>
>>
>> With this setup, and running "ssh -D 9998 gateway.host" to start the proxy
>> connection, MapReduce jobs started on the local machine execute fine on
>> the
>> remote cluster. However, trying to launch a Spark job fails with the nodes
>> of the cluster apparently unable to communicate with one another:
>>
>> java.io.IOException: Failed on local exception: java.net.SocketException:
>> Connection refused; Host Details : local host is: "node3/10.211.55.103";
>> destination host is: "node1":8030;
>>
>> Looking at the packets being sent to node1 from node3, it's clear that no
>> requests are made on port 8030, hinting that the connection is somehow
>> being
>> proxied.
>>
>> Is it possible that the Spark job is not honoring the socket.factory
>> settings on the *cluster* side for some reason?
>>
>> Note that  Spark JIRA 5004
>> <https://issues.apache.org/jira/browse/SPARK-5004>   addresses a similar
>> problem, though it looks like they are actually not the same (since in
>> that
>> case it sounds like a standalone cluster is being used).
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/problems-running-Spark-on-a-firewalled-remote-YARN-cluster-via-SOCKS-proxy-tp23955.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Re: problems running Spark on a firewalled remote YARN cluster via SOCKS proxy

Reply via email to