subject:"Spark on Yarn vs Standalone"

Re: Spark on Yarn vs Standalone

2015-09-21 Thread Alexander Pivovarov

I repartitioned input RDD from 4,800 to 24,000 partitions After that the stage (24000 tasks) was done in 22 min on 100 boxes Shuffle read/write: 905 GB / 710 GB Task Metrics (Dur/GC/Read/Write) Min: 7s/1s/38MB/30MB Med: 22s/9s/38MB/30MB Max:1.8min/1.6min/38MB/30MB On Mon, Sep 21, 2015 at 5:55 PM,

Re: Spark on Yarn vs Standalone

2015-09-21 Thread Sandy Ryza

The warning your seeing in Spark is no issue. The scratch space lives inside the heap, so it'll never result in YARN killing the container by itself. The issue is that Spark is using some off-heap space on top of that. You'll need to bump the spark.yarn.executor.memoryOverhead property to give t

Re: Spark on Yarn vs Standalone

2015-09-21 Thread Saisai Shao

I think you need to increase the memory size of executor through command arguments "--executor-memory", or configuration "spark.executor.memory". Also yarn.scheduler.maximum-allocation-mb in Yarn side if necessary. Thanks Saisai On Mon, Sep 21, 2015 at 5:13 PM, Alexander Pivovarov wrote: > I

Re: Spark on Yarn vs Standalone

2015-09-21 Thread Alexander Pivovarov

I noticed that some executors have issue with scratch space. I see the following in yarn app container stderr around the time when yarn killed the executor because it uses too much memory. -- App container stderr -- 15/09/21 21:43:22 WARN storage.MemoryStore: Not enough space to cache rdd_6_346 in

Re: Spark on Yarn vs Standalone

2015-09-10 Thread Sandy Ryza

YARN will never kill processes for being unresponsive. It may kill processes for occupying more memory than it allows. To get around this, you can either bump spark.yarn.executor.memoryOverhead or turn off the memory checks entirely with yarn.nodemanager.pmem-check-enabled. -Sandy On Tue, Sep 8

Re: Spark on Yarn vs Standalone

2015-09-08 Thread Alexander Pivovarov

The problem which we have now is skew data (2360 tasks done in 5 min, 3 tasks in 40 min and 1 task in 2 hours) Some people from the team worry that the executor which runs the longest task can be killed by YARN (because executor might be unresponsive because of GC or it might occupy more memory th

Re: Spark on Yarn vs Standalone

2015-09-08 Thread Sandy Ryza

Those settings seem reasonable to me. Are you observing performance that's worse than you would expect? -Sandy On Mon, Sep 7, 2015 at 11:22 AM, Alexander Pivovarov wrote: > Hi Sandy > > Thank you for your reply > Currently we use r3.2xlarge boxes (vCPU: 8, Mem: 61 GiB) > with emr setting for S

Re: Spark on Yarn vs Standalone

2015-09-07 Thread Alexander Pivovarov

Hi Sandy Thank you for your reply Currently we use r3.2xlarge boxes (vCPU: 8, Mem: 61 GiB) with emr setting for Spark "maximizeResourceAllocation": "true" It is automatically converted to Spark settings spark.executor.memory47924M spark.yarn.executor.memoryOverhead 5324 we also set s

Re: Spark on Yarn vs Standalone

2015-09-07 Thread Sandy Ryza

Hi Alex, If they're both configured correctly, there's no reason that Spark Standalone should provide performance or memory improvement over Spark on YARN. -Sandy On Fri, Sep 4, 2015 at 1:24 PM, Alexander Pivovarov wrote: > Hi Everyone > > We are trying the latest aws emr-4.0.0 and Spark and m

Spark on Yarn vs Standalone

2015-09-04 Thread Alexander Pivovarov

Hi Everyone We are trying the latest aws emr-4.0.0 and Spark and my question is about YARN vs Standalone mode. Our usecase is - start 100-150 nodes cluster every week, - run one heavy spark job (5-6 hours) - save data to s3 - stop cluster Officially aws emr-4.0.0 comes with Spark on Yarn It's pro

Re: Spark on Yarn vs Standalone

Re: Spark on Yarn vs Standalone

Re: Spark on Yarn vs Standalone

Re: Spark on Yarn vs Standalone

Re: Spark on Yarn vs Standalone

Re: Spark on Yarn vs Standalone

Re: Spark on Yarn vs Standalone

Re: Spark on Yarn vs Standalone

Re: Spark on Yarn vs Standalone

Spark on Yarn vs Standalone

10 matches

Site Navigation

Mail list logo

Footer information