Those settings seem reasonable to me. Are you observing performance that's worse than you would expect?
-Sandy On Mon, Sep 7, 2015 at 11:22 AM, Alexander Pivovarov <apivova...@gmail.com> wrote: > Hi Sandy > > Thank you for your reply > Currently we use r3.2xlarge boxes (vCPU: 8, Mem: 61 GiB) > with emr setting for Spark "maximizeResourceAllocation": "true" > > It is automatically converted to Spark settings > spark.executor.memory 47924M > spark.yarn.executor.memoryOverhead 5324 > > we also set spark.default.parallelism = slave_count * 16 > > Does it look good for you? (we run single heavy job on cluster) > > Alex > > On Mon, Sep 7, 2015 at 11:03 AM, Sandy Ryza <sandy.r...@cloudera.com> > wrote: > >> Hi Alex, >> >> If they're both configured correctly, there's no reason that Spark >> Standalone should provide performance or memory improvement over Spark on >> YARN. >> >> -Sandy >> >> On Fri, Sep 4, 2015 at 1:24 PM, Alexander Pivovarov <apivova...@gmail.com >> > wrote: >> >>> Hi Everyone >>> >>> We are trying the latest aws emr-4.0.0 and Spark and my question is >>> about YARN vs Standalone mode. >>> Our usecase is >>> - start 100-150 nodes cluster every week, >>> - run one heavy spark job (5-6 hours) >>> - save data to s3 >>> - stop cluster >>> >>> Officially aws emr-4.0.0 comes with Spark on Yarn >>> It's probably possible to hack emr by creating bootstrap script which >>> stops yarn and starts master and slaves on each computer (to start Spark >>> in standalone mode) >>> >>> My questions are >>> - Does Spark standalone provides significant performance / memory >>> improvement in comparison to YARN mode? >>> - Does it worth hacking official emr Spark on Yarn and switch Spark to >>> Standalone mode? >>> >>> >>> I already created comparison table and want you to check if my >>> understanding is correct >>> >>> Lets say r3.2xlarge computer has 52GB ram available for Spark Executor >>> JVMs >>> >>> standalone to yarn comparison >>> >>> >>> STDLN YARN >>> >>> can executor allocate up to 52GB ram - yes | >>> yes >>> >>> will executor be unresponsive after using all 52GB ram because of GC - >>> yes | yes >>> >>> additional JVMs on slave except of spark executor - workr | node >>> mngr >>> >>> are additional JVMs lightweight - >>> yes | yes >>> >>> >>> Thank you >>> >>> Alex >>> >> >> >