The median GC time is 1.3 mins for a median duration of 41 mins. What parameters can I tune for controlling GC.
Other details, median Peak execution memory of 13 G and input records of 2.3 gigs. 180-200 executors launched. - Thanks, via mobile, excuse brevity. On May 21, 2016 10:59 AM, "Reynold Xin" <r...@databricks.com> wrote: > It's probably due to GC. > > On Fri, May 20, 2016 at 5:54 PM, Yash Sharma <yash...@gmail.com> wrote: > >> Hi All, >> I am here to get some expert advice on a use case I am working on. >> >> Cluster & job details below - >> >> Data - 6 Tb >> Cluster - EMR - 15 Nodes C3-8xLarge (shared by other MR apps) >> >> Parameters- >> --executor-memory 10G \ >> --executor-cores 6 \ >> --conf spark.dynamicAllocation.enabled=true \ >> --conf spark.dynamicAllocation.initialExecutors=15 \ >> >> Runtime : 3 Hrs >> >> On monitoring the metrics I notices 10G for executors is not required >> (since I don't have lot of groupings) >> >> Reducing to --executor-memory 3G, Runtime reduced to: 2 Hrs >> >> Question: >> On adding more nodes now has absolutely no effect on the runtime. Is >> there anything I can tune/change/experiment with to make the job faster. >> >> Workload: Mostly reduceBy's and scans. >> >> Would appreciate any insights and thoughts. Best Regards >> >> >> >