Am going with the default java opts for emr-
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70
-XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled
-XX:OnOutOfMemoryError='kill -9 %p'
The data is not partitioned. Its 6Tb data of a
The median GC time is 1.3 mins for a median duration of 41 mins. What
parameters can I tune for controlling GC.
Other details, median Peak execution memory of 13 G and input records of
2.3 gigs.
180-200 executors launched.
- Thanks, via mobile, excuse brevity.
On May 21, 2016 10:59 AM, "Reynold
Yash:
Can you share the JVM parameters you used ?
How many partitions are there in your data set ?
Thanks
On Fri, May 20, 2016 at 5:59 PM, Reynold Xin wrote:
> It's probably due to GC.
>
> On Fri, May 20, 2016 at 5:54 PM, Yash Sharma wrote:
>
>> Hi All,
>> I am here to get some expert advice
It's probably due to GC.
On Fri, May 20, 2016 at 5:54 PM, Yash Sharma wrote:
> Hi All,
> I am here to get some expert advice on a use case I am working on.
>
> Cluster & job details below -
>
> Data - 6 Tb
> Cluster - EMR - 15 Nodes C3-8xLarge (shared by other MR apps)
>
> Parameters-
> --execut
Hi All,
I am here to get some expert advice on a use case I am working on.
Cluster & job details below -
Data - 6 Tb
Cluster - EMR - 15 Nodes C3-8xLarge (shared by other MR apps)
Parameters-
--executor-memory 10G \
--executor-cores 6 \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.d