okay. since for us the main purpose is to retrieve (small) data i guess i will stick to yarn client mode. thx
On Thu, Jun 19, 2014 at 3:19 PM, DB Tsai <[email protected]> wrote: > Currently, we save the result in HDFS, and read it back in our > application. Since Clinet.run is blocking call, it's easy to do it in > this way. > > We are now working on using akka to bring back the result to app > without going through the HDFS, and we can use akka to track the log, > and stack trace, etc. > > Sincerely, > > DB Tsai > ------------------------------------------------------- > My Blog: https://www.dbtsai.com > LinkedIn: https://www.linkedin.com/in/dbtsai > > > On Thu, Jun 19, 2014 at 12:08 PM, Koert Kuipers <[email protected]> wrote: > > db tsai, > > if in yarn-cluster mode the driver runs inside yarn, how can you do a > > rdd.collect and bring the results back to your application? > > > > > > On Thu, Jun 19, 2014 at 2:33 PM, DB Tsai <[email protected]> wrote: > >> > >> We are submitting the spark job in our tomcat application using > >> yarn-cluster mode with great success. As Kevin said, yarn-client mode > >> runs driver in your local JVM, and it will have really bad network > >> overhead when one do reduce action which will pull all the result from > >> executor to your local JVM. Also, since you can only have one spark > >> context object in one JVM, it will be tricky to run multiple spark > >> jobs concurrently with yarn-clinet mode. > >> > >> This is how we submit spark job with yarn-cluster mode. Please use the > >> current master code, otherwise, after the job is finished, spark will > >> kill the JVM and exit your app. > >> > >> We setup the configuration of spark in a scala map. > >> > >> def getArgsFromConf(conf: Map[String, String]): Array[String] = { > >> Array[String]( > >> "--jar", conf.get("app.jar").getOrElse(""), > >> "--addJars", conf.get("spark.addJars").getOrElse(""), > >> "--class", conf.get("spark.mainClass").getOrElse(""), > >> "--num-executors", conf.get("spark.numWorkers").getOrElse("1"), > >> "--driver-memory", conf.get("spark.masterMemory").getOrElse("1g"), > >> "--executor-memory", > conf.get("spark.workerMemory").getOrElse("1g"), > >> "--executor-cores", conf.get("spark.workerCores").getOrElse("1")) > >> } > >> > >> System.setProperty("SPARK_YARN_MODE", "true") > >> val sparkConf = new SparkConf > >> val args = getArgsFromConf(conf) > >> new Client(new ClientArguments(args, sparkConf), hadoopConfig, > >> sparkConf).run > >> > >> Sincerely, > >> > >> DB Tsai > >> ------------------------------------------------------- > >> My Blog: https://www.dbtsai.com > >> LinkedIn: https://www.linkedin.com/in/dbtsai > >> > >> > >> On Thu, Jun 19, 2014 at 11:22 AM, Kevin Markey <[email protected] > > > >> wrote: > >> > Yarn client is much like Spark client mode, except that the executors > >> > are > >> > running in Yarn containers managed by the Yarn resource manager on the > >> > cluster instead of as Spark workers managed by the Spark master. The > >> > driver > >> > executes as a local client in your local JVM. It communicates with > the > >> > workers on the cluster. Transformations are scheduled on the cluster > by > >> > the > >> > driver's logic. Actions involve communication between local driver > and > >> > remote cluster executors. So, there is some additional network > >> > overhead, > >> > especially if the driver is not co-located on the cluster. In > >> > yarn-cluster > >> > mode -- in contrast, the driver is executed as a thread in a Yarn > >> > application master on the cluster. > >> > > >> > In either case, the assembly JAR must be available to the application > on > >> > the > >> > cluster. Best to copy it to HDFS and specify its location by > exporting > >> > its > >> > location as SPARK_JAR. > >> > > >> > Kevin Markey > >> > > >> > > >> > On 06/19/2014 11:22 AM, Koert Kuipers wrote: > >> > > >> > i am trying to understand how yarn-client mode works. i am not using > >> > spark-submit, but instead launching a spark job from within my own > >> > application. > >> > > >> > i can see my application contacting yarn successfully, but then in > yarn > >> > i > >> > get an immediate error: > >> > > >> > Application application_1403117970283_0014 failed 2 times due to AM > >> > Container for appattempt_1403117970283_0014_000002 exited with > exitCode: > >> > -1000 due to: File file:/home/koert/test-assembly-0.1-SNAPSHOT.jar > does > >> > not > >> > exist > >> > .Failing this attempt.. Failing the application. > >> > > >> > why is yarn trying to fetch my jar, and why as a local file? i would > >> > expect > >> > the jar to be send to yarn over the wire upon job submission? > >> > > >> > > > > > >
