Even though it does not sound intuitive, reduce by key expects all values
for a particular key for a partition to be loaded into memory. So once you
increase the partitions you can run the jobs.
Ok, so that worked flawlessly after I upped the number of partitions to 400
from 40.
Thanks!
On Fri, May 13, 2016 at 7:28 PM, Sung Hwan Chung
wrote:
> I'll try that, as of now I have a small number of partitions in the order
> of 20~40.
>
> It would be great if there's some documentation on the
I'll try that, as of now I have a small number of partitions in the order
of 20~40.
It would be great if there's some documentation on the memory requirement
wrt the number of keys and the number of partitions per executor (i.e., the
Spark's internal memory requirement outside of the user space).
Have you taken a look at SPARK-11293 ?
Consider using repartition to increase the number of partitions.
FYI
On Fri, May 13, 2016 at 12:14 PM, Sung Hwan Chung
wrote:
> Hello,
>
> I'm using Spark version 1.6.0 and have trouble with memory when trying to
> do reducebykey on a dataset with as many
It would be the "40%", although it's probably better to think of it as
shuffle vs. data cache and the remainder goes to tasks. As the comments for
the shuffle memory fraction configuration clarify that it will be taking
memory at the expense of the storage/data cache fraction:
spark.shuffle.memory
This is the fraction available for caching, which is 60% * 90% * total
by default.
On Fri, Apr 17, 2015 at 11:30 AM, podioss wrote:
> Hi,
> i am a bit confused with the executor-memory option. I am running
> applications with Standalone cluster manager with 8 workers with 4gb memory
> and 2 cores
Thanks for the clarifications. I misunderstood what the number on UI meant.
On Mon, Dec 15, 2014 at 7:00 PM, Sean Owen wrote:
> I believe this corresponds to the 0.6 of the whole heap that is
> allocated for caching partitions. See spark.storage.memoryFraction on
> http://spark.apache.org/docs/l
I believe this corresponds to the 0.6 of the whole heap that is
allocated for caching partitions. See spark.storage.memoryFraction on
http://spark.apache.org/docs/latest/configuration.html 0.6 of 4GB is
about 2.3GB.
The note there is important, that you probably don't want to exceed
the JVM old ge
Hi Pala,
Spark executors only reserve spark.storage.memoryFraction (default 0.6) of
their spark.executor.memory for caching RDDs. The spark UI displays this
fraction.
spark.executor.memory controls the executor heap size.
spark.yarn.executor.memoryOverhead controls the extra that's tacked on
>>
Date: Tuesday, August 19, 2014 at 9:23 AM
To: Capital One
mailto:benjamin.la...@capitalone.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>"
mailto:user@spark.apache.org>>
Subject: Re: Executor Memory, Task hangs
Given a fixed amount of memory all
Given a fixed amount of memory allocated to your workers, more memory per
executor means fewer executors can execute in parallel. This means it takes
longer to finish all of the tasks. Set high enough, and your executors can
find no worker with enough memory and so they all are stuck waiting for
re
Looks like 1 worker is doing the job. Can you repartition the RDD? Also
what is the number of cores that you allocated? Things like this, you can
easily identify by looking at the workers webUI (default worker:8081)
Thanks
Best Regards
On Tue, Aug 19, 2014 at 6:35 PM, Laird, Benjamin <
benjamin.
12 matches
Mail list logo