Re: Job initialization performance of Spark standalone mode vs YARN

Ron Gonzalez Fri, 04 Apr 2014 08:08:57 -0700

Hi,
  Can you explain a little more what's going on? Which one submits a job to the 
yarn cluster that creates an application master and spawns containers for the 
local jobs? I tried yarn-client and submitted to our yarn cluster and it seems 
to work that way.  Shouldn't Client.scala be running within the AppMaster 
instance in this run mode?
  How exactly does yarn-standalone work?


Thanks,
Ron

Sent from my iPhone

> On Apr 3, 2014, at 11:19 AM, Kevin Markey <kevin.mar...@oracle.com> wrote:
> 
> We are now testing precisely what you ask about in our environment.  But 
> Sandy's questions are relevant.  The bigger issue is not Spark vs. Yarn but 
> "client" vs. "standalone" and where the client is located on the network 
> relative to the cluster.
> 
> The "client" options that locate the client/master remote from the cluster, 
> while useful for interactive queries, suffer from considerable network 
> traffic overhead as the master schedules and transfers data with the worker 
> nodes on the cluster.  The "standalone" options locate the master/client on 
> the cluster.  In yarn-standalone, the master is a thread contained by the 
> Yarn Resource Manager.  Lots less traffic, as the master is co-located with 
> the worker nodes on the cluster and its scheduling/data communication has 
> less latency.
> 
> In my comparisons between yarn-client and yarn-standalone (so as not to 
> conflate yarn vs Spark), yarn-client computation time is at least double 
> yarn-standalone!  At least for a job with lots of stages and lots of 
> client/worker communication, although rather few "collect" actions, so it's 
> mainly scheduling that's relevant here.
> 
> I'll be posting more information as I have it available.
> 
> Kevin
> 
> 
>> On 03/03/2014 03:48 PM, Sandy Ryza wrote:
>> Are you running in yarn-standalone mode or yarn-client mode?  Also, what 
>> YARN scheduler and what NodeManager heartbeat?  
>> 
>> 
>> On Sun, Mar 2, 2014 at 9:41 PM, polkosity <polkos...@gmail.com> wrote:
>>> Thanks for the advice Mayur.
>>> 
>>> I thought I'd report back on the performance difference...  Spark standalone
>>> mode has executors processing at capacity in under a second :)
>>> 
>>> 
>>> 
>>> --
>>> View this message in context: 
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark-standalone-mode-vs-YARN-tp2016p2243.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Job initialization performance of Spark standalone mode vs YARN

Reply via email to