Hi,

> My understanding is that .localCheckpoint() breaks the lineage of the RDD

True.

> and this requires that the entire RDD to be rebuild instead of being able
to recompute lost partitions.

In a sense, it's as if you saved the partitions to executors and re-read
them back as source data (for this checkpointed RDD).

> Does each executor store a copy of the entire RDD?

No. An executor has got only the data of the partitions (for the tasks this
executor has executed).

> Checkpoint over .localCheckpoint.

checkpoint is similar to localCheckpoint, but slower and reliable (as it's
on a stable HDFS file system not on an ephemeral executor). In either case,
the lineage should be the same = cut.

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski

<https://twitter.com/jaceklaskowski>


On Wed, Jan 6, 2021 at 6:15 PM brettplarson <brettpatricklar...@gmail.com>
wrote:

> Hello,
> I am wondering what the impact of using .localCheckpoint() and having the
> executor die would be?
>
> My understanding is that .localCheckpoint() breaks the lineage of the RDD
> and this requires that the entire RDD to be rebuild instead of being able
> to
> recompute lost partitions.
>
> Does each executor store a copy of the entire RDD?
>
> It's unclear to me the benefit of using Checkpoint over .localCheckpoint.
> (I
> am aware that this is HDFS backed, but it's unclear the implications of
> this)
>
> Please let me know,
> Thank you!
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to