Jacek, Thanks for your response, I am still trying to understand the impact of an executor dying after a localCheckpoint is taken.
Would the entire spark application fail in this case due to the broken lineage? Or would the jobs associated with that executor need to be re-computed from scratch? Thank you! On Wed, Jan 6, 2021 at 1:09 PM Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > > My understanding is that .localCheckpoint() breaks the lineage of the RDD > > True. > > > and this requires that the entire RDD to be rebuild instead of being > able to recompute lost partitions. > > In a sense, it's as if you saved the partitions to executors and re-read > them back as source data (for this checkpointed RDD). > > > Does each executor store a copy of the entire RDD? > > No. An executor has got only the data of the partitions (for the tasks > this executor has executed). > > > Checkpoint over .localCheckpoint. > > checkpoint is similar to localCheckpoint, but slower and reliable (as it's > on a stable HDFS file system not on an ephemeral executor). In either case, > the lineage should be the same = cut. > > Pozdrawiam, > Jacek Laskowski > ---- > https://about.me/JacekLaskowski > "The Internals Of" Online Books <https://books.japila.pl/> > Follow me on https://twitter.com/jaceklaskowski > > <https://twitter.com/jaceklaskowski> > > > On Wed, Jan 6, 2021 at 6:15 PM brettplarson <brettpatricklar...@gmail.com> > wrote: > >> Hello, >> I am wondering what the impact of using .localCheckpoint() and having the >> executor die would be? >> >> My understanding is that .localCheckpoint() breaks the lineage of the RDD >> and this requires that the entire RDD to be rebuild instead of being able >> to >> recompute lost partitions. >> >> Does each executor store a copy of the entire RDD? >> >> It's unclear to me the benefit of using Checkpoint over .localCheckpoint. >> (I >> am aware that this is HDFS backed, but it's unclear the implications of >> this) >> >> Please let me know, >> Thank you! >> >> >> >> >> -- >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> -- *Brett Larson * brettpatricklar...@gmail.com / 847321200