Hi,

The partition information for Spark master is not updated after a stage
failure. In case of HDFS, Spark gets partition information from InputFormat
and if a data node in HDFS is down when spark is performing computation for
a certain stage, this stage will fail and be resubmitted using the same
partition information.  With failures like a data node loss, I think the
partition index you get when you call mapPartitionWithIndex is consistent.

Hope this helps!
Liquan

On Mon, Oct 6, 2014 at 12:33 PM, Sung Hwan Chung <coded...@cs.stanford.edu>
wrote:

> Is the RDD partition index you get when you call mapPartitionWithIndex
> consistent under fault-tolerance condition?
>
> I.e.
>
> 1. Say index is 1 for one of the partitions when you call
> data.mapPartitionWithIndex((index, rows) => ....) // Say index is 1
> 2. The partition fails (maybe a long with a bunch of other partitions).
> 3. When the partitions get restarted somewhere else, will they retain the
> same index value, as well as all the lineage arguments?
>
>
>


-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst

Reply via email to