Thanks Nilesh,
Thanks for sharing those docs. I have came across most of those tuning in
past and believe me I have tune the hack of out of this job. What I can't
beleive is spark needs 4x more resource then MapReduce to run the same job
(for dataset magnitude of >100GB).
I was able to run my job
Hi Nirav,
I recently attended the Spark Summit East 2016 and almost
every talk about errors faced by community and/or tuning topics for Spark
mentioned this being the main problem (Executor lost and JVM out of
memory).
Checkout this blogs that explains how to tune
.run(DFSOutputStream.java:745)
>
> Kindly help me understand the conf.
>
>
> Thanks in advance.
>
> Regards
> Arun.
>
> ------
> *From:* Kuchekar [kuchekar.nil...@gmail.com]
> *Sent:* 11 February 2016 09:42
> *To:* Nirav Patel
>
Thanks Nilesh. I don't think there;s heavy communication between driver and
executor. However I'll try the settings you suggested.
I can not replace groupBy with reduceBy as its not an associative
operation.
It is very frustrating to be honest. It was a piece of cake with map reduce
compare to am
> Regards
> Arun.
>
> From: Kuchekar [kuchekar.nil...@gmail.com]
> Sent: 11 February 2016 09:42
> To: Nirav Patel
> Cc: spark users
> Subject: Re: Spark execuotr Memory profiling
>
> Hi Nirav,
>
> I faced similar issue with Yar
r.run(DFSOutputStream.java:745)
Kindly help me understand the conf.
Thanks in advance.
Regards
Arun.
From: Kuchekar [kuchekar.nil...@gmail.com]
Sent: 11 February 2016 09:42
To: Nirav Patel
Cc: spark users
Subject: Re: Spark execuotr Memory profiling
Hi Nirav,
Hi Nirav,
I faced similar issue with Yarn, EMR 1.5.2 and following
Spark Conf helped me. You can set the values accordingly
conf= (SparkConf().set("spark.master","yarn-client").setAppName("HalfWay"
).set("spark.driver.memory", "15G").set("spark.yarn.am.memory","15G"))
conf=conf