Hi, I'm wondering how Spark is setting the "index" of task? I'm asking this question because we have a job that constantly fails at task index = 421.
When increasing number of partitions, this then fails at index=4421. Increase it a little bit more, now it's 24421. Our job is as simple as "(1) read json -> (2) group-by sesion identifier -> (3) write parquet files" and always fails somewhere at step (3) with a CommitDeniedException. We've identified that some troubles are basically due to uneven data repartition right after step (2), and now try to go further in our understanding on how does Spark behaves. We're using Spark 1.5.2, scala 2.11, on top of hadoop 2.6.0 -- *Adrien Mogenet* Head of Backend/Infrastructure adrien.moge...@contentsquare.com http://www.contentsquare.com 50, avenue Montaigne - 75008 Paris