Can you at least copy paste the error(s) you are seeing when the job fails?
Without the error message(s), it's hard to even suggest anything.

*Alex Rovner*
*Director, Data Engineering *
*o:* 646.759.0052

* <http://www.magnetic.com/>*

On Sat, Oct 3, 2015 at 9:50 AM, Umesh Kacha <umesh.ka...@gmail.com> wrote:

> Hi thanks I cant share yarn logs because of privacy in my company but I
> can tell you I have seen yarn logs there I have not found anything except
> YARN killing container because it is exceeds physical memory capacity.
>
> I am using the following command line script Above job launches around
> 1500 ExecutorService threads from a driver with a thread pool of 15 so at a
> time 15 jobs will be running as showing in UI.
>
> ./spark-submit --class com.xyz.abc.MySparkJob
>
> --conf "spark.executor.extraJavaOptions=-XX:MaxPermSize=512M" -
>
> -driver-java-options -XX:MaxPermSize=512m -
>
> -driver-memory 4g --master yarn-client
>
> --executor-memory 27G --executor-cores 2
>
> --num-executors 40
>
> --jars /path/to/others-jars
>
> /path/to/spark-job.jar
>
>
> On Sat, Oct 3, 2015 at 7:11 PM, Alex Rovner <alex.rov...@magnetic.com>
> wrote:
>
>> Can you send over your yarn logs along with the command you are using to
>> submit your job?
>>
>> *Alex Rovner*
>> *Director, Data Engineering *
>> *o:* 646.759.0052
>>
>> * <http://www.magnetic.com/>*
>>
>> On Sat, Oct 3, 2015 at 9:07 AM, Umesh Kacha <umesh.ka...@gmail.com>
>> wrote:
>>
>>> Hi Alex thanks much for the reply. Please read the following for more
>>> details about my problem.
>>>
>>>
>>> http://stackoverflow.com/questions/32317285/spark-executor-oom-issue-on-yarn
>>>
>>> My each container has 8 core and 30 GB max memory. So I am using
>>> yarn-client mode using 40 executors with 27GB/2 cores. If I use more cores
>>> then my job start loosing more executors. I tried to set
>>> spark.yarn.executor.memoryOverhead around 2 GB even 8 GB but it does
>>> not help I loose executors no matter what. The reason is my jobs shuffles
>>> lots of data even 20 GB of data in every job in UI I have seen it. Shuffle
>>> happens because of group by and I cant avoid it in my case.
>>>
>>>
>>>
>>> On Sat, Oct 3, 2015 at 6:27 PM, Alex Rovner <alex.rov...@magnetic.com>
>>> wrote:
>>>
>>>> This sounds like you need to increase YARN overhead settings with the 
>>>> "spark.yarn.executor.memoryOverhead"
>>>> parameter. See http://spark.apache.org/docs/latest/running-on-yarn.html
>>>> for more information on the setting.
>>>>
>>>> If that does not work for you, please provide the error messages and
>>>> the command line you are using to submit your jobs for further
>>>> troubleshooting.
>>>>
>>>>
>>>> *Alex Rovner*
>>>> *Director, Data Engineering *
>>>> *o:* 646.759.0052
>>>>
>>>> * <http://www.magnetic.com/>*
>>>>
>>>> On Sat, Oct 3, 2015 at 6:19 AM, unk1102 <umesh.ka...@gmail.com> wrote:
>>>>
>>>>> Hi I have couple of Spark jobs which uses group by query which is
>>>>> getting
>>>>> fired from hiveContext.sql() Now I know group by is evil but my use
>>>>> case I
>>>>> cant avoid group by I have around 7-8 fields on which I need to do
>>>>> group by.
>>>>> Also I am using df1.except(df2) which also seems heavy operation and
>>>>> does
>>>>> lots of shuffling please see my UI snap
>>>>> <
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n24914/IMG_20151003_151830218.jpg
>>>>> >
>>>>>
>>>>> I have tried almost all optimisation including Spark 1.5 but nothing
>>>>> seems
>>>>> to be working and my job fails hangs because of executor will reach
>>>>> physical
>>>>> memory limit and YARN will kill it. I have around 1TB of data to
>>>>> process and
>>>>> it is skewed. Please guide.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-optimize-group-by-query-fired-using-hiveContext-sql-tp24914.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to