This sounds like you need to increase YARN overhead settings with the
"spark.yarn.executor.memoryOverhead"
parameter. See http://spark.apache.org/docs/latest/running-on-yarn.html for
more information on the setting.

If that does not work for you, please provide the error messages and the
command line you are using to submit your jobs for further troubleshooting.


*Alex Rovner*
*Director, Data Engineering *
*o:* 646.759.0052

* <http://www.magnetic.com/>*

On Sat, Oct 3, 2015 at 6:19 AM, unk1102 <umesh.ka...@gmail.com> wrote:

> Hi I have couple of Spark jobs which uses group by query which is getting
> fired from hiveContext.sql() Now I know group by is evil but my use case I
> cant avoid group by I have around 7-8 fields on which I need to do group
> by.
> Also I am using df1.except(df2) which also seems heavy operation and does
> lots of shuffling please see my UI snap
> <
> http://apache-spark-user-list.1001560.n3.nabble.com/file/n24914/IMG_20151003_151830218.jpg
> >
>
> I have tried almost all optimisation including Spark 1.5 but nothing seems
> to be working and my job fails hangs because of executor will reach
> physical
> memory limit and YARN will kill it. I have around 1TB of data to process
> and
> it is skewed. Please guide.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-optimize-group-by-query-fired-using-hiveContext-sql-tp24914.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to