Re: Quick question on spark performance

2016-05-20 Thread Yash Sharma
Am going with the default java opts for emr- -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p' The data is not partitioned. Its 6Tb data of a

Re: Quick question on spark performance

2016-05-20 Thread Yash Sharma
The median GC time is 1.3 mins for a median duration of 41 mins. What parameters can I tune for controlling GC. Other details, median Peak execution memory of 13 G and input records of 2.3 gigs. 180-200 executors launched. - Thanks, via mobile, excuse brevity. On May 21, 2016 10:59 AM, "Reynold

Re: Quick question on spark performance

2016-05-20 Thread Ted Yu
Yash: Can you share the JVM parameters you used ? How many partitions are there in your data set ? Thanks On Fri, May 20, 2016 at 5:59 PM, Reynold Xin wrote: > It's probably due to GC. > > On Fri, May 20, 2016 at 5:54 PM, Yash Sharma wrote: > >> Hi All, >> I am here to get some expert advice

Re: Quick question on spark performance

2016-05-20 Thread Reynold Xin
It's probably due to GC. On Fri, May 20, 2016 at 5:54 PM, Yash Sharma wrote: > Hi All, > I am here to get some expert advice on a use case I am working on. > > Cluster & job details below - > > Data - 6 Tb > Cluster - EMR - 15 Nodes C3-8xLarge (shared by other MR apps) > > Parameters- > --execut

Quick question on spark performance

2016-05-20 Thread Yash Sharma
Hi All, I am here to get some expert advice on a use case I am working on. Cluster & job details below - Data - 6 Tb Cluster - EMR - 15 Nodes C3-8xLarge (shared by other MR apps) Parameters- --executor-memory 10G \ --executor-cores 6 \ --conf spark.dynamicAllocation.enabled=true \ --conf spark.d