Thanks TD, I have got some understanding now.
[email protected] From: Tathagata Das Date: 2015-07-31 13:45 To: [email protected] CC: yuzhihong; user Subject: Re: Re: How RDD lineage works https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/FailureSuite.scala This may help. On Thu, Jul 30, 2015 at 10:42 PM, [email protected] <[email protected]> wrote: The following is copied from the paper, is something related with rdd lineage. Is there a unit test that covers this scenario(rdd partition lost and recovery)? Thanks. If a partition of an RDD is lost, the RDD has enough information about how it was derived from other RDDs to recompute just that partition. Thus, lost data can be recovered, often quite quickly, without requiring costly replication. [email protected] From: [email protected] Date: 2015-07-31 13:11 To: Tathagata Das; yuzhihong CC: user Subject: Re: Re: How RDD lineage works Thanks TD and Zhihong for the guide. I will check it [email protected] From: Tathagata Das Date: 2015-07-31 12:27 To: Ted Yu CC: [email protected]; user Subject: Re: How RDD lineage works You have to read the original Spark paper to understand how RDD lineage works. https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf On Thu, Jul 30, 2015 at 9:25 PM, Ted Yu <[email protected]> wrote: Please take a look at: core/src/test/scala/org/apache/spark/CheckpointSuite.scala Cheers On Thu, Jul 30, 2015 at 7:39 PM, [email protected] <[email protected]> wrote: Hi, I don't get a good understanding how RDD lineage works, so I would ask whether spark provides a unit test in the code base to illustrate how RDD lineage works. If there is, What's the class name is it? Thanks! [email protected]
