Re: Job consistently failing after leftOuterJoin() - oddly sized / non-uniform partitions

ayan guha Mon, 06 Jul 2015 18:13:21 -0700

You can bump up number of partition by a parameter in join operator.
However you have a data skew problem which you need to resolve using a
reasonable partition by function
On 7 Jul 2015 08:57, "Mohammed Omer" <[email protected]> wrote:


> Afternoon all,
>
> Really loving this project and the community behind it. Thank you all for
> your hard work.
>
>
> This past week, though, I've been having a hard time getting my first
> deployed job to run without failing at the same point every time: Right
> after a leftOuterJoin, most partitions (600 total) are small (1-100MB),
> while some others are large (3-6GB). The large ones consistently spill
> 20-60GB into memory, and eventually fail.
>
>
> If I could only get the partitions to be smaller, right out of the
> leftOuterJoin, it seems like the job would run fine.
>
>
> I've tried trawling through the logs, but it hasn't been very fruitful in
> finding out what, specifically, is the issue.
>
>
> Cluster setup:
>
> * 6 worker nodes (16 cores, 104GB Memory, 500GB storage)
>
> * 1 master (same config as above)
>
>
> Running Spark on YARN, with:
>
>
> storage.memoryFraction = .3
>
> --executors = 6
>
> --executor-cores = 12
>
> --executor-memory = kind of confusing due to YARN, but basically in the
> Spark monitor site's Executors page, it shows each as running with 18.8GB
> memory, though I know usage is much larger due to YARN managing various
> pieces. (Total memory available to yarn shows 480GB, with 270GB currently
> used).
>
>
> Screenshot of the task page: http://i.imgur.com/xG3KdEl.png
>
> Code:
> https://gist.github.com/momer/8bc03c60a639e5c04eda#file-spark-scala-L60 (see
> line 60 for the relevant area)
>
>
>
> Any pointers in the right direction, or advice on articles to read, or
> even debugging / settings advice or recommendations would be extremely
> helpful. I'll put a bounty on this of $50 donation to the ASF! :D
>
>
> Thank you all for reading (and hopefully replying!),
>
>
> Mo Omer
>

Re: Job consistently failing after leftOuterJoin() - oddly sized / non-uniform partitions

Reply via email to