Generally the yarn cluster handles propogating and setting HADOOP_CONF_DIR for any containers it launches, so it should really just be on your client node submitting the applications.
I haven't specifically tried doing what you said, but like you say Spark doesn't really expose the configuration object being used. It does have an interface to pass it in: Client(clientArgs: ClientArguments, hadoopConf: Configuration, spConf: SparkConf). But I don't know if that has been tested to make sure it propogates everywhere. There are also places it calls SparkHadoopUtil.get.newConfiguration() so not sure those would handle it properly. You can always file a jira to add support for it and see what people think. Tom On Thursday, April 3, 2014 8:46 AM, Ron Gonzalez <zlgonza...@yahoo.com> wrote: Right thanks, that worked. My goal is to programmatically submit things to the yarn cluster. The underlying framework we have is a set of property files that specify different machines for dev, qe, prod. While it's definitely possible to have different things deployed as the client etc/hadoop directory, I was just curious if the only way is to have the different things setup as environment variables or if there was a way to programmatically override particular configurations. I looked at the Client.scala code and it seems like it creates a new Configuration object that isn't accessible from the outside so most likely the answer is no, which is a reasonable answer. I just have to figure out a different deployment model for doing the different stages of the lifecycle. Thanks, Ron On Thursday, April 3, 2014 6:29 AM, Tom Graves <tgraves...@yahoo.com> wrote: You should just be making sure your HADOOP_CONF_DIR env variable is correct and not setting yarn.resourcemanager.address in SparkConf. For Yarn/Hadoop you need to point it to the configuration files for your cluster. Generally that setting goes into yarn-site.xml. If just setting it doesn't work, make sure $HADOOP_CONF_DIR is getting put into your classpath. I would also make sure HADOOP_PREFIX is being set. Tom On Wednesday, April 2, 2014 10:10 PM, Ron Gonzalez <zlgonza...@yahoo.com> wrote: Hi, I have a small program but I cannot seem to make it connect to the right properties of the cluster. I have the SPARK_YARN_APP_JAR, SPARK_JAR and SPARK_HOME set properly. If I run this scala file, I am seeing that this is never using the yarn.resourcemanager.address property that I set on the SparkConf instance. Any advice? Thanks, Ron import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.deploy.yarn.Client import java.lang.System import org.apache.spark.SparkConf object SimpleApp { def main(args: Array[String]) { val logFile = "/home/rgonzalez/app/spark-0.9.0-incubating-bin-hadoop2/README.md" val conf = new SparkConf() conf.set("yarn.resourcemanager.address", "localhost:8050") val sc = new SparkContext("yarn-client", "Simple App", conf) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line => line.contains("a")).count() val numBs = logData.filter(line => line.contains("b")).count() println("Lines with a: %s, Lines with b: %s".format(numAs, numBs)) } }