We are in the process of upgrading to spark 1.6 from 1.4, and had a hard
time getting some of our more memory/join intensive jobs to work (rdd
caching + a lot of shuffling). Most of the time they were getting killed by
yarn.
Increasing the overhead was of course an option but the increase to make
bq. that solved some problems
Is there any problem that was not solved by the tweak ?
Thanks
On Thu, Mar 3, 2016 at 4:11 PM, Eugen Cepoi wrote:
> You can limit the amount of memory spark will use for shuffle even in 1.6.
> You can do that by tweaking the spark.memory.fraction and the
> spark.s
You can limit the amount of memory spark will use for shuffle even in 1.6.
You can do that by tweaking the spark.memory.fraction and the
spark.storage.fraction. For example if you want to have no shuffle cache at
all you can set the storage.fraction to 1 or something close, to let a
small place for
Spark shuffling algorithm is very aggressive in storing everything in RAM,
and the behavior is worse in 1.6 with the UnifiedMemoryManagement. At least
in previous versions you can limit the shuffler memory, but Spark 1.6 will
use as much memory as it can get. What I see is that Spark seems to
under
Hello all,
I'm using spark 1.6 and trying to cache a dataset which is 1.5 TB, I have
only ~800GB RAM in total, so I am choosing the DISK_ONLY storage level.
Unfortunately, I'm getting out of the overhead memory limit:
Container killed by YARN for exceeding memory limits. 27.0 GB of 27 GB
physic