I am trying to run Spark applications with the driver running locally and
interacting with a firewalled remote cluster via a SOCKS proxy. 

I have to modify the hadoop configuration on the *local machine* to try to
make this work, adding 

<property>
   <name>hadoop.rpc.socket.factory.class.default</name>
   <value>org.apache.hadoop.net.SocksSocketFactory</value>
</property>
<property>
   <name>hadoop.socks.server</name>
   <value>localhost:9998</value>
</property>

and on the *remote cluster* side

<property>
    <name>hadoop.rpc.socket.factory.class.default</name>
    <value>org.apache.hadoop.net.StandardSocketFactory</value>
    <final>true</final>
</property>

With this setup, and running "ssh -D 9998 gateway.host" to start the proxy
connection, MapReduce jobs started on the local machine execute fine on the
remote cluster. However, trying to launch a Spark job fails with the nodes
of the cluster apparently unable to communicate with one another: 

java.io.IOException: Failed on local exception: java.net.SocketException:
Connection refused; Host Details : local host is: "node3/10.211.55.103";
destination host is: "node1":8030;

Looking at the packets being sent to node1 from node3, it's clear that no
requests are made on port 8030, hinting that the connection is somehow being
proxied. 

Is it possible that the Spark job is not honoring the socket.factory
settings on the *cluster* side for some reason? 

Note that  Spark JIRA 5004
<https://issues.apache.org/jira/browse/SPARK-5004>   addresses a similar
problem, though it looks like they are actually not the same (since in that
case it sounds like a standalone cluster is being used). 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/problems-running-Spark-on-a-firewalled-remote-YARN-cluster-via-SOCKS-proxy-tp23955.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to