We do actually have replicated StorageLevels in Spark. You can use MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom replication factor.
BTW you guys should probably have this discussion on the JIRA rather than the dev list; I think the replies somehow ended up on the dev list. Matei On May 17, 2014, at 1:36 AM, Mridul Muralidharan <mri...@gmail.com> wrote: > We don't have 3x replication in spark :-) > And if we use replicated storagelevel, while decreasing odds of failure, it > does not eliminate it (since we are not doing a great job with replication > anyway from fault tolerance point of view). > Also it does take a nontrivial performance hit with replicated levels. > > Regards, > Mridul > On 17-May-2014 8:16 am, "Xiangrui Meng" <men...@gmail.com> wrote: > >> With 3x replication, we should be able to achieve fault tolerance. >> This checkPointed RDD can be cleared if we have another in-memory >> checkPointed RDD down the line. It can avoid hitting disk if we have >> enough memory to use. We need to investigate more to find a good >> solution. -Xiangrui >> >> On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan <mri...@gmail.com> >> wrote: >>> Effectively this is persist without fault tolerance. >>> Failure of any node means complete lack of fault tolerance. >>> I would be very skeptical of truncating lineage if it is not reliable. >>> On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)" <j...@apache.org> wrote: >>> >>>> Xiangrui Meng created SPARK-1855: >>>> ------------------------------------ >>>> >>>> Summary: Provide memory-and-local-disk RDD checkpointing >>>> Key: SPARK-1855 >>>> URL: https://issues.apache.org/jira/browse/SPARK-1855 >>>> Project: Spark >>>> Issue Type: New Feature >>>> Components: MLlib, Spark Core >>>> Affects Versions: 1.0.0 >>>> Reporter: Xiangrui Meng >>>> >>>> >>>> Checkpointing is used to cut long lineage while maintaining fault >>>> tolerance. The current implementation is HDFS-based. Using the BlockRDD >> we >>>> can create in-memory-and-local-disk (with replication) checkpoints that >> are >>>> not as reliable as HDFS-based solution but faster. >>>> >>>> It can help applications that require many iterations. >>>> >>>> >>>> >>>> -- >>>> This message was sent by Atlassian JIRA >>>> (v6.2#6252) >>>> >>