Re: Quick question on spark performance

Yash Sharma Fri, 20 May 2016 18:08:22 -0700

The median GC time is 1.3 mins for a median duration of 41 mins. What
parameters can I tune for controlling GC.


Other details, median Peak execution memory of 13 G and input records of
2.3 gigs.
180-200 executors launched.

- Thanks, via mobile,  excuse brevity.
On May 21, 2016 10:59 AM, "Reynold Xin" <r...@databricks.com> wrote:

> It's probably due to GC.
>
> On Fri, May 20, 2016 at 5:54 PM, Yash Sharma <yash...@gmail.com> wrote:
>
>> Hi All,
>> I am here to get some expert advice on a use case I am working on.
>>
>> Cluster & job details below -
>>
>> Data - 6 Tb
>> Cluster - EMR - 15 Nodes C3-8xLarge (shared by other MR apps)
>>
>> Parameters-
>> --executor-memory 10G \
>> --executor-cores 6 \
>> --conf spark.dynamicAllocation.enabled=true \
>> --conf spark.dynamicAllocation.initialExecutors=15 \
>>
>> Runtime : 3 Hrs
>>
>> On monitoring the metrics I notices 10G for executors is not required
>> (since I don't have lot of groupings)
>>
>> Reducing to --executor-memory 3G, Runtime reduced to: 2 Hrs
>>
>> Question:
>> On adding more nodes now has absolutely no effect on the runtime. Is
>> there anything I can tune/change/experiment with to make the job faster.
>>
>> Workload: Mostly reduceBy's and scans.
>>
>> Would appreciate any insights and thoughts. Best Regards
>>
>>
>>
>

Re: Quick question on spark performance

Reply via email to