Can you at least copy paste the error(s) you are seeing when the job fails? Without the error message(s), it's hard to even suggest anything.
*Alex Rovner* *Director, Data Engineering * *o:* 646.759.0052 * <http://www.magnetic.com/>* On Sat, Oct 3, 2015 at 9:50 AM, Umesh Kacha <umesh.ka...@gmail.com> wrote: > Hi thanks I cant share yarn logs because of privacy in my company but I > can tell you I have seen yarn logs there I have not found anything except > YARN killing container because it is exceeds physical memory capacity. > > I am using the following command line script Above job launches around > 1500 ExecutorService threads from a driver with a thread pool of 15 so at a > time 15 jobs will be running as showing in UI. > > ./spark-submit --class com.xyz.abc.MySparkJob > > --conf "spark.executor.extraJavaOptions=-XX:MaxPermSize=512M" - > > -driver-java-options -XX:MaxPermSize=512m - > > -driver-memory 4g --master yarn-client > > --executor-memory 27G --executor-cores 2 > > --num-executors 40 > > --jars /path/to/others-jars > > /path/to/spark-job.jar > > > On Sat, Oct 3, 2015 at 7:11 PM, Alex Rovner <alex.rov...@magnetic.com> > wrote: > >> Can you send over your yarn logs along with the command you are using to >> submit your job? >> >> *Alex Rovner* >> *Director, Data Engineering * >> *o:* 646.759.0052 >> >> * <http://www.magnetic.com/>* >> >> On Sat, Oct 3, 2015 at 9:07 AM, Umesh Kacha <umesh.ka...@gmail.com> >> wrote: >> >>> Hi Alex thanks much for the reply. Please read the following for more >>> details about my problem. >>> >>> >>> http://stackoverflow.com/questions/32317285/spark-executor-oom-issue-on-yarn >>> >>> My each container has 8 core and 30 GB max memory. So I am using >>> yarn-client mode using 40 executors with 27GB/2 cores. If I use more cores >>> then my job start loosing more executors. I tried to set >>> spark.yarn.executor.memoryOverhead around 2 GB even 8 GB but it does >>> not help I loose executors no matter what. The reason is my jobs shuffles >>> lots of data even 20 GB of data in every job in UI I have seen it. Shuffle >>> happens because of group by and I cant avoid it in my case. >>> >>> >>> >>> On Sat, Oct 3, 2015 at 6:27 PM, Alex Rovner <alex.rov...@magnetic.com> >>> wrote: >>> >>>> This sounds like you need to increase YARN overhead settings with the >>>> "spark.yarn.executor.memoryOverhead" >>>> parameter. See http://spark.apache.org/docs/latest/running-on-yarn.html >>>> for more information on the setting. >>>> >>>> If that does not work for you, please provide the error messages and >>>> the command line you are using to submit your jobs for further >>>> troubleshooting. >>>> >>>> >>>> *Alex Rovner* >>>> *Director, Data Engineering * >>>> *o:* 646.759.0052 >>>> >>>> * <http://www.magnetic.com/>* >>>> >>>> On Sat, Oct 3, 2015 at 6:19 AM, unk1102 <umesh.ka...@gmail.com> wrote: >>>> >>>>> Hi I have couple of Spark jobs which uses group by query which is >>>>> getting >>>>> fired from hiveContext.sql() Now I know group by is evil but my use >>>>> case I >>>>> cant avoid group by I have around 7-8 fields on which I need to do >>>>> group by. >>>>> Also I am using df1.except(df2) which also seems heavy operation and >>>>> does >>>>> lots of shuffling please see my UI snap >>>>> < >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n24914/IMG_20151003_151830218.jpg >>>>> > >>>>> >>>>> I have tried almost all optimisation including Spark 1.5 but nothing >>>>> seems >>>>> to be working and my job fails hangs because of executor will reach >>>>> physical >>>>> memory limit and YARN will kill it. I have around 1TB of data to >>>>> process and >>>>> it is skewed. Please guide. >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-optimize-group-by-query-fired-using-hiveContext-sql-tp24914.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>> >>>>> >>>> >>> >> >