Hello friends:

I use the Cloudera/CDH5 version of Spark (v1.0.0 Spark RPMs), but the
following is also true when
using the Apache Spark distribution built against a locally installed
Hadoop/YARN installation.

The problem:

If the following directory exists, */etc/hadoop/conf/*, and the pertinent
'*.xml' files within it for
*HDFS* are configured to use host, say, /*namenode*/ as the HDFS namenode,
then no
matter how I *locally* invoke pyspark on the command line, it always tries
to connect to */namenode/*,
which I don't always want because I don't always have HDFS running.

In other words, the following always experiences an exception when it cannot
connect to HDFS:

user$ *export MASTER=local[NN]; pyspark --master local[NN]*

The only work-around I've found to this, is to do the following, which is
not good at all:

user$ *(cd /etc/hadoop; sudo mv conf _conf); export MASTER=local[NN];
pyspark --master local[NN]*

Without temporarily moving the Hadoop/YARN configuration directory, how do I
dynamcally instruct
pyspark on the CLI to not use HDFS? (i.e. without hard-codes anywhere, such
as in
*/etc/spark/spark-env.sh*)

Thank you in advance!
didata staff



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/This-always-tries-to-connect-to-HDFS-user-export-MASTER-local-NN-pyspark-master-local-NN-tp13207.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to