Sure folks, will try later today!
Best Regards
Ankit Khettry
On Sat, 7 Sep, 2019, 6:56 PM Sunil Kalra, wrote:
> Ankit
>
> Can you try reducing number of cores or increasing memory. Because with
> below configuration your each core is getting ~3.5 GB. Otherwise your data
> is skewed, that one of
Ankit
Can you try reducing number of cores or increasing memory. Because with
below configuration your each core is getting ~3.5 GB. Otherwise your data
is skewed, that one of cores is getting too much data based key.
spark.executor.cores 6 spark.executor.memory 36g
On Sat, Sep 7, 2019 at 6:35 A
It says you have 3811 tasks in earlier stages and you're going down to 2001
partitions, that would make it more memory intensive. I'm guessing the
default spark shuffle partition was 200 so that would have failed. Go for
higher number, maybe even higher than 3811. What was your shuffle write
from s
You can try, consider processing each partition separately if your data is
heavily skewed when you partition it.
On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, wrote:
> Thanks Chris
>
> Going to try it soon by setting maybe spark.sql.shuffle.partitions to
> 2001. Also, I was wondering if it would he
Thanks Chris
Going to try it soon by setting maybe spark.sql.shuffle.partitions to 2001.
Also, I was wondering if it would help if I repartition the data by the
fields I am using in group by and window operations?
Best Regards
Ankit Khettry
On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, wrote:
> Hi
Hi Ankit,
Without looking at the Spark UI and the stages/DAG, I'm guessing you're
running on default number of Spark shuffle partitions.
If you're seeing a lot of shuffle spill, you likely have to increase the
number of shuffle partitions to accommodate the huge shuffle size.
I hope that helps
C