I followed this thread http://apache-spark-user-list.1001560.n3.nabble.com/YARN-issues-with-resourcemanager-scheduler-address-td5201.html#a5258
to set SPARK_YARN_USER_ENV to HADOOP_CONF_DIR export SPARK_YARN_USER_ENV="CLASSPATH=$HADOOP_CONF_DIR" and used the following command to share conf directories on all machines. export SPARK_YARN_DIST_FILES=$(ls $HADOOP_CONF_DIR* | sed 's#^#file://#g' |tr '\n' ',' ) and then I used the following command to start spark-shell ./spark-shell --master yarn-client --executor-memory 32g This time I didn't get the "14/08/15 15:44:51 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:" errors. but a new exception (see below java.net.URISyntaxException). Any idea why this is happening ? Also, although I see the REPL prompt, sc is not available in the REPL. 14/08/16 02:27:52 INFO yarn.Client: Uploading file:/usr/lib/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar to hdfs://n001-10ge1:8020/user/ssimanta/.sparkStaging/application_1408130563059_0011/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar *java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 5: conf:* at org.apache.hadoop.fs.Path.initialize(Path.java:206) at org.apache.hadoop.fs.Path.<init>(Path.java:172) at org.apache.hadoop.fs.Path.<init>(Path.java:94) at org.apache.spark.deploy.yarn.ClientBase$class.org $apache$spark$deploy$yarn$ClientBase$$copyRemoteFile(ClientBase.scala:161) at org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$4$$anonfun$apply$2.apply(ClientBase.scala:238) at org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$4$$anonfun$apply$2.apply(ClientBase.scala:233) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$4.apply(ClientBase.scala:233) at org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$4.apply(ClientBase.scala:231) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.deploy.yarn.ClientBase$class.prepareLocalResources(ClientBase.scala:231) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:39) at org.apache.spark.deploy.yarn.Client.runApp(Client.scala:74) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:81) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:136) at org.apache.spark.SparkContext.<init>(SparkContext.scala:318) at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:957) at $iwC$$iwC.<init>(<console>:8) at $iwC.<init>(<console>:14) at <init>(<console>:16) at .<init>(<console>:20) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:121) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:120) at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:263) at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:120) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:56) at org.apache.spark.repl.SparkILoop$$anonfun$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:913) at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:142) at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:56) at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:104) at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:56) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:930) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 5: conf: at java.net.URI$Parser.fail(URI.java:2829) at java.net.URI$Parser.failExpecting(URI.java:2835) at java.net.URI$Parser.parse(URI.java:3038) at java.net.URI.<init>(URI.java:753) at org.apache.hadoop.fs.Path.initialize(Path.java:203) ... 62 more Spark context available as sc. On Fri, Aug 15, 2014 at 3:49 PM, Soumya Simanta <soumya.sima...@gmail.com> wrote: > After changing the allocation I'm getting the following in my logs. No > idea what this means. > > 14/08/15 15:44:33 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:34 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:35 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:36 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:37 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:38 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:39 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:40 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:41 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:42 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:43 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:44 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:45 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:46 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:47 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:48 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:49 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:50 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > > yarnAppState: ACCEPTED > > > 14/08/15 15:44:51 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1408131861372 > yarnAppState: ACCEPTED > > > On Fri, Aug 15, 2014 at 2:47 PM, Sandy Ryza <sandy.r...@cloudera.com> > wrote: > >> We generally recommend setting yarn.scheduler.maximum-allocation-mbto >> the maximum node capacity. >> >> -Sandy >> >> >> On Fri, Aug 15, 2014 at 11:41 AM, Soumya Simanta < >> soumya.sima...@gmail.com> wrote: >> >>> I just checked the YARN config and looks like I need to change this >>> value. Should be upgraded to 48G (the max memory allocated to YARN) per >>> node ? >>> >>> <property> >>> <name>yarn.scheduler.maximum-allocation-mb</name> >>> <value>6144</value> >>> <source>java.io.BufferedInputStream@2e7e1ee</source> >>> </property> >>> >>> >>> On Fri, Aug 15, 2014 at 2:37 PM, Soumya Simanta < >>> soumya.sima...@gmail.com> wrote: >>> >>>> Andrew, >>>> >>>> Thanks for your response. >>>> >>>> When I try to do the following. >>>> >>>> ./spark-shell --executor-memory 46g --master yarn >>>> >>>> I get the following error. >>>> >>>> Exception in thread "main" java.lang.Exception: When running with >>>> master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the >>>> environment. >>>> >>>> at >>>> org.apache.spark.deploy.SparkSubmitArguments.checkRequiredArguments(SparkSubmitArguments.scala:166) >>>> >>>> at >>>> org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:61) >>>> >>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:50) >>>> >>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>> >>>> After this I set the following env variable. >>>> >>>> export YARN_CONF_DIR=/usr/lib/hadoop-yarn/etc/hadoop/ >>>> >>>> The program launches but then halts with the following error. >>>> >>>> >>>> *14/08/15 14:33:22 ERROR yarn.Client: Required executor memory (47104 >>>> MB), is above the max threshold (6144 MB) of this cluster.* >>>> >>>> I guess this is some YARN setting that is not set correctly. >>>> >>>> >>>> Thanks >>>> >>>> -Soumya >>>> >>>> >>>> On Fri, Aug 15, 2014 at 2:19 PM, Andrew Or <and...@databricks.com> >>>> wrote: >>>> >>>>> Hi Soumya, >>>>> >>>>> The driver's console output prints out how much memory is actually >>>>> granted to each executor, so from there you can verify how much memory the >>>>> executors are actually getting. You should use the '--executor-memory' >>>>> argument in spark-shell. For instance, assuming each node has 48G of >>>>> memory, >>>>> >>>>> bin/spark-shell --executor-memory 46g --master yarn >>>>> >>>>> We leave a small cushion for the OS so we don't take up all of the >>>>> entire system's memory. This option also applies to the standalone mode >>>>> you've been using, but if you have been using the ec2 scripts, we set >>>>> "spark.executor.memory" in conf/spark-defaults.conf for you automatically >>>>> so you don't have to specify it each time on the command line. Of course, >>>>> you can also do the same in YARN. >>>>> >>>>> -Andrew >>>>> >>>>> >>>>> >>>>> 2014-08-15 10:45 GMT-07:00 Soumya Simanta <soumya.sima...@gmail.com>: >>>>> >>>>> I've been using the standalone cluster all this time and it worked >>>>>> fine. >>>>>> Recently I'm using another Spark cluster that is based on YARN and >>>>>> I've not experience with YARN. >>>>>> >>>>>> The YARN cluster has 10 nodes and a total memory of 480G. >>>>>> >>>>>> I'm having trouble starting the spark-shell with enough memory. >>>>>> I'm doing a very simple operation - reading a file 100GB from HDFS >>>>>> and running a count on it. This fails due to out of memory on the >>>>>> executors. >>>>>> >>>>>> Can someone point to the command line parameters that I should use >>>>>> for spark-shell so that it? >>>>>> >>>>>> >>>>>> Thanks >>>>>> -Soumya >>>>>> >>>>>> >>>>> >>>> >>> >> >