Hi,

I'm wondering how Spark is setting the "index" of task?
I'm asking this question because we have a job that constantly fails at
task index = 421.

When increasing number of partitions, this then fails at index=4421.
Increase it a little bit more, now it's 24421.

Our job is as simple as "(1) read json -> (2) group-by sesion identifier ->
(3) write parquet files" and always fails somewhere at step (3) with a
CommitDeniedException. We've identified that some troubles are basically
due to uneven data repartition right after step (2), and now try to go
further in our understanding on how does Spark behaves.

We're using Spark 1.5.2, scala 2.11, on top of hadoop 2.6.0

-- 

*Adrien Mogenet*
Head of Backend/Infrastructure
adrien.moge...@contentsquare.com
http://www.contentsquare.com
50, avenue Montaigne - 75008 Paris

Reply via email to