Hello, I am wondering what the impact of using .localCheckpoint() and having the executor die would be?
My understanding is that .localCheckpoint() breaks the lineage of the RDD and this requires that the entire RDD to be rebuild instead of being able to recompute lost partitions. Does each executor store a copy of the entire RDD? It's unclear to me the benefit of using Checkpoint over .localCheckpoint. (I am aware that this is HDFS backed, but it's unclear the implications of this) Please let me know, Thank you! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org