Hi, Can you explain a little more what's going on? Which one submits a job to the yarn cluster that creates an application master and spawns containers for the local jobs? I tried yarn-client and submitted to our yarn cluster and it seems to work that way. Shouldn't Client.scala be running within the AppMaster instance in this run mode? How exactly does yarn-standalone work?
Thanks, Ron Sent from my iPhone > On Apr 3, 2014, at 11:19 AM, Kevin Markey <kevin.mar...@oracle.com> wrote: > > We are now testing precisely what you ask about in our environment. But > Sandy's questions are relevant. The bigger issue is not Spark vs. Yarn but > "client" vs. "standalone" and where the client is located on the network > relative to the cluster. > > The "client" options that locate the client/master remote from the cluster, > while useful for interactive queries, suffer from considerable network > traffic overhead as the master schedules and transfers data with the worker > nodes on the cluster. The "standalone" options locate the master/client on > the cluster. In yarn-standalone, the master is a thread contained by the > Yarn Resource Manager. Lots less traffic, as the master is co-located with > the worker nodes on the cluster and its scheduling/data communication has > less latency. > > In my comparisons between yarn-client and yarn-standalone (so as not to > conflate yarn vs Spark), yarn-client computation time is at least double > yarn-standalone! At least for a job with lots of stages and lots of > client/worker communication, although rather few "collect" actions, so it's > mainly scheduling that's relevant here. > > I'll be posting more information as I have it available. > > Kevin > > >> On 03/03/2014 03:48 PM, Sandy Ryza wrote: >> Are you running in yarn-standalone mode or yarn-client mode? Also, what >> YARN scheduler and what NodeManager heartbeat? >> >> >> On Sun, Mar 2, 2014 at 9:41 PM, polkosity <polkos...@gmail.com> wrote: >>> Thanks for the advice Mayur. >>> >>> I thought I'd report back on the performance difference... Spark standalone >>> mode has executors processing at capacity in under a second :) >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark-standalone-mode-vs-YARN-tp2016p2243.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >