Persist doesn't cut lineage. You might run into StackOverflow problem with a long lineage. See https://spark-project.atlassian.net/browse/SPARK-1006 for example.
On Mon, Apr 21, 2014 at 12:11 PM, Diana Carroll <[email protected]> wrote: > When might that be necessary or useful? Presumably I can persist and > replicate my RDD to avoid re-computation, if that's my goal. What advantage > does checkpointing provide over disk persistence with replication? > > > On Mon, Apr 21, 2014 at 2:42 PM, Xiangrui Meng <[email protected]> wrote: >> >> Checkpoint clears dependencies. You might need checkpoint to cut a >> long lineage in iterative algorithms. -Xiangrui >> >> On Mon, Apr 21, 2014 at 11:34 AM, Diana Carroll <[email protected]> >> wrote: >> > I'm trying to understand when I would want to checkpoint an RDD rather >> > than >> > just persist to disk. >> > >> > Every reference I can find to checkpoint related to Spark Streaming. >> > But >> > the method is defined in the core Spark library, not Streaming. >> > >> > Does it exist solely for streaming, or are there circumstances unrelated >> > to >> > streaming in which I might want to checkpoint...and if so, like what? >> > >> > Thanks, >> > Diana > >
