Generally the yarn cluster handles propogating and setting HADOOP_CONF_DIR for 
any containers it launches, so it should really just be on your client node 
submitting the applications.  

I haven't specifically tried doing what you said, but like you say Spark 
doesn't really expose the configuration object being used.  It does have an 
interface to pass it in: Client(clientArgs: ClientArguments, hadoopConf: 
Configuration, spConf: SparkConf).   But I don't know if that has been tested 
to make sure it propogates everywhere. There are also places it calls 
SparkHadoopUtil.get.newConfiguration() so not sure those would handle it 
properly.

You can always file a jira to add support for it and see what people think.

Tom
On Thursday, April 3, 2014 8:46 AM, Ron Gonzalez <zlgonza...@yahoo.com> wrote:
 
Right thanks, that worked.
My goal is to programmatically submit things to the yarn cluster. The 
underlying framework we have is a set of property files that specify different 
machines for dev, qe, prod. While it's definitely possible to have different 
things deployed as the client etc/hadoop directory, I was just curious if the 
only way is to have the different things setup as environment variables or if 
there was a way to programmatically override particular configurations.
I looked at the Client.scala code and it seems like it creates a new 
Configuration object that isn't accessible from the outside so most likely the 
answer is no, which is a reasonable answer. I just have to figure out a 
different deployment model for doing the different stages of the lifecycle.

Thanks,
Ron
On Thursday, April 3, 2014 6:29 AM, Tom Graves <tgraves...@yahoo.com> wrote:
 
You should just be making sure your HADOOP_CONF_DIR env variable is correct and 
not setting yarn.resourcemanager.address in SparkConf.  For Yarn/Hadoop you 
need to point it to the configuration files for your cluster.   Generally that 
setting goes into yarn-site.xml. If just setting it doesn't work, make sure 
$HADOOP_CONF_DIR is getting put into your classpath.   I would also make sure 
HADOOP_PREFIX is being set.

Tom
On Wednesday, April 2, 2014 10:10 PM, Ron Gonzalez <zlgonza...@yahoo.com> wrote:
 
Hi,
  I have a small program but I cannot seem to make it connect to the right 
properties of the cluster.
  I have the SPARK_YARN_APP_JAR, SPARK_JAR and SPARK_HOME set properly.
  If I run this scala file, I am seeing that this is never using the 
yarn.resourcemanager.address property that I set on the SparkConf instance.
  Any advice?

Thanks,
Ron

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.deploy.yarn.Client
import java.lang.System
import org.apache.spark.SparkConf


object SimpleApp {
  def main(args: Array[String]) {
    val logFile = 
"/home/rgonzalez/app/spark-0.9.0-incubating-bin-hadoop2/README.md"
    val conf = new SparkConf()
    conf.set("yarn.resourcemanager.address", "localhost:8050")
    val sc = new SparkContext("yarn-client", "Simple App", conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

Reply via email to