Hello friends: I use the Cloudera/CDH5 version of Spark (v1.0.0 Spark RPMs), but the following is also true when using the Apache Spark distribution built against a locally installed Hadoop/YARN installation.
The problem: If the following directory exists, */etc/hadoop/conf/*, and the pertinent '*.xml' files within it for *HDFS* are configured to use host, say, /*namenode*/ as the HDFS namenode, then no matter how I *locally* invoke pyspark on the command line, it always tries to connect to */namenode/*, which I don't always want because I don't always have HDFS running. In other words, the following always experiences an exception when it cannot connect to HDFS: user$ *export MASTER=local[NN]; pyspark --master local[NN]* The only work-around I've found to this, is to do the following, which is not good at all: user$ *(cd /etc/hadoop; sudo mv conf _conf); export MASTER=local[NN]; pyspark --master local[NN]* Without temporarily moving the Hadoop/YARN configuration directory, how do I dynamcally instruct pyspark on the CLI to not use HDFS? (i.e. without hard-codes anywhere, such as in */etc/spark/spark-env.sh*) Thank you in advance! didata staff -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/This-always-tries-to-connect-to-HDFS-user-export-MASTER-local-NN-pyspark-master-local-NN-tp13207.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org