I repartitioned input RDD from 4,800 to 24,000 partitions
After that the stage (24000 tasks) was done in 22 min on 100 boxes
Shuffle read/write: 905 GB / 710 GB
Task Metrics (Dur/GC/Read/Write)
Min: 7s/1s/38MB/30MB
Med: 22s/9s/38MB/30MB
Max:1.8min/1.6min/38MB/30MB
On Mon, Sep 21, 2015 at 5:55 PM,
The warning your seeing in Spark is no issue. The scratch space lives
inside the heap, so it'll never result in YARN killing the container by
itself. The issue is that Spark is using some off-heap space on top of
that.
You'll need to bump the spark.yarn.executor.memoryOverhead property to give
t
I think you need to increase the memory size of executor through command
arguments "--executor-memory", or configuration "spark.executor.memory".
Also yarn.scheduler.maximum-allocation-mb in Yarn side if necessary.
Thanks
Saisai
On Mon, Sep 21, 2015 at 5:13 PM, Alexander Pivovarov
wrote:
> I
I noticed that some executors have issue with scratch space.
I see the following in yarn app container stderr around the time when yarn
killed the executor because it uses too much memory.
-- App container stderr --
15/09/21 21:43:22 WARN storage.MemoryStore: Not enough space to cache
rdd_6_346 in
YARN will never kill processes for being unresponsive.
It may kill processes for occupying more memory than it allows. To get
around this, you can either bump spark.yarn.executor.memoryOverhead or turn
off the memory checks entirely with yarn.nodemanager.pmem-check-enabled.
-Sandy
On Tue, Sep 8
The problem which we have now is skew data (2360 tasks done in 5 min, 3
tasks in 40 min and 1 task in 2 hours)
Some people from the team worry that the executor which runs the longest
task can be killed by YARN (because executor might be unresponsive because
of GC or it might occupy more memory th
Those settings seem reasonable to me.
Are you observing performance that's worse than you would expect?
-Sandy
On Mon, Sep 7, 2015 at 11:22 AM, Alexander Pivovarov
wrote:
> Hi Sandy
>
> Thank you for your reply
> Currently we use r3.2xlarge boxes (vCPU: 8, Mem: 61 GiB)
> with emr setting for S
Hi Sandy
Thank you for your reply
Currently we use r3.2xlarge boxes (vCPU: 8, Mem: 61 GiB)
with emr setting for Spark "maximizeResourceAllocation": "true"
It is automatically converted to Spark settings
spark.executor.memory47924M
spark.yarn.executor.memoryOverhead 5324
we also set s
Hi Alex,
If they're both configured correctly, there's no reason that Spark
Standalone should provide performance or memory improvement over Spark on
YARN.
-Sandy
On Fri, Sep 4, 2015 at 1:24 PM, Alexander Pivovarov
wrote:
> Hi Everyone
>
> We are trying the latest aws emr-4.0.0 and Spark and m
Hi Everyone
We are trying the latest aws emr-4.0.0 and Spark and my question is about
YARN vs Standalone mode.
Our usecase is
- start 100-150 nodes cluster every week,
- run one heavy spark job (5-6 hours)
- save data to s3
- stop cluster
Officially aws emr-4.0.0 comes with Spark on Yarn
It's pro
10 matches
Mail list logo