We don't have 3x replication in spark :-) And if we use replicated storagelevel, while decreasing odds of failure, it does not eliminate it (since we are not doing a great job with replication anyway from fault tolerance point of view). Also it does take a nontrivial performance hit with replicated levels.
Regards, Mridul On 17-May-2014 8:16 am, "Xiangrui Meng" <men...@gmail.com> wrote: > With 3x replication, we should be able to achieve fault tolerance. > This checkPointed RDD can be cleared if we have another in-memory > checkPointed RDD down the line. It can avoid hitting disk if we have > enough memory to use. We need to investigate more to find a good > solution. -Xiangrui > > On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan <mri...@gmail.com> > wrote: > > Effectively this is persist without fault tolerance. > > Failure of any node means complete lack of fault tolerance. > > I would be very skeptical of truncating lineage if it is not reliable. > > On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)" <j...@apache.org> wrote: > > > >> Xiangrui Meng created SPARK-1855: > >> ------------------------------------ > >> > >> Summary: Provide memory-and-local-disk RDD checkpointing > >> Key: SPARK-1855 > >> URL: https://issues.apache.org/jira/browse/SPARK-1855 > >> Project: Spark > >> Issue Type: New Feature > >> Components: MLlib, Spark Core > >> Affects Versions: 1.0.0 > >> Reporter: Xiangrui Meng > >> > >> > >> Checkpointing is used to cut long lineage while maintaining fault > >> tolerance. The current implementation is HDFS-based. Using the BlockRDD > we > >> can create in-memory-and-local-disk (with replication) checkpoints that > are > >> not as reliable as HDFS-based solution but faster. > >> > >> It can help applications that require many iterations. > >> > >> > >> > >> -- > >> This message was sent by Atlassian JIRA > >> (v6.2#6252) > >> >