according to the code, SPARK_YARN_APP_JAR is retrieved from system variables. and the key-value pairs you pass through to JavaSparkContext is isolated from system variables. so, you maybe should try setting it through System.setProperty().
thanks On Wed, Apr 23, 2014 at 6:05 PM, 肥肥 <19934...@qq.com> wrote: > I have a small program, which I can launch successfully by yarn client > with yarn-standalon mode. > > the command look like this: > (javac javac -classpath .:jars/spark-assembly-0.9.1-hadoop2.2.0.jar > LoadTest.java) > (jar cvf loadtest.jar LoadTest.class) > SPARK_JAR=assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.2.0.jar > ./bin/spark-class org.apache.spark.deploy.yarn.Client --jar > /opt/mytest/loadtest.jar --class LoadTest --args yarn-standalone > --num-workers 2 --master-memory 2g --worker-memory 2g --worker-cores 1 > > the program LoadTest.java: > public class LoadTest { > static final String USER = "root"; > public static void main(String[] args) { > System.setProperty("user.name", USER); > System.setProperty("HADOOP_USER_NAME", USER); > System.setProperty("spark.executor.memory", "7g"); > JavaSparkContext sc = new JavaSparkContext(args[0], "LoadTest", > System.getenv("SPARK_HOME"), JavaSparkContext.jarOfClass(LoadTest.class)); > String file = "file:/opt/mytest/123.data"; > JavaRDD<String> data1 = sc.textFile(file, 2); > long c1=data1.count(); > System.out.println("1============"+c1); > } > } > > BUT due to my other pragram's need, I must have it run with command of > "java". So I add “environment” parameter to JavaSparkContext(). Followed is > The ERROR I get: > Exception in thread "main" org.apache.spark.SparkException: env > SPARK_YARN_APP_JAR is not set > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:49) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:125) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:200) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:100) > at > org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:93) > at LoadTest.main(LoadTest.java:37) > > the program LoadTest.java: > public class LoadTest { > > static final String USER = "root"; > public static void main(String[] args) { > System.setProperty("user.name", USER); > System.setProperty("HADOOP_USER_NAME", USER); > System.setProperty("spark.executor.memory", "7g"); > > Map<String, String> env = new HashMap<String, String>(); > env.put("SPARK_YARN_APP_JAR", "file:/opt/mytest/loadtest.jar"); > env.put("SPARK_WORKER_INSTANCES", "2" ); > env.put("SPARK_WORKER_CORES", "1"); > env.put("SPARK_WORKER_MEMORY", "2G"); > env.put("SPARK_MASTER_MEMORY", "2G"); > env.put("SPARK_YARN_APP_NAME", "LoadTest"); > env.put("SPARK_YARN_DIST_ARCHIVES", > "file:/opt/test/spark-0.9.1-bin-hadoop1/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.2.0.jar"); > JavaSparkContext sc = new JavaSparkContext("yarn-client", > "LoadTest", System.getenv("SPARK_HOME"), > JavaSparkContext.jarOfClass(LoadTest.class), env); > String file = "file:/opt/mytest/123.dna"; > JavaRDD<String> data1 = sc.textFile(file, 2);//.cache(); > > long c1=data1.count(); > System.out.println("1============"+c1); > } > } > > the command: > javac -classpath .:jars/spark-assembly-0.9.1-hadoop2.2.0.jar LoadTest.java > jar cvf loadtest.jar LoadTest.class > nohup java -classpath .:jars/spark-assembly-0.9.1-hadoop2.2.0.jar LoadTest > >> loadTest.log 2>&1 & > > What did I miss?? Or I did it in wrong way?? >