Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

Matei Zaharia Sat, 17 May 2014 14:56:22 -0700

We do actually have replicated StorageLevels in Spark. You can use 
MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom 
replication factor.


BTW you guys should probably have this discussion on the JIRA rather than the 
dev list; I think the replies somehow ended up on the dev list.

Matei

On May 17, 2014, at 1:36 AM, Mridul Muralidharan <[email protected]> wrote:

> We don't have 3x replication in spark :-)
> And if we use replicated storagelevel, while decreasing odds of failure, it
> does not eliminate it (since we are not doing a great job with replication
> anyway from fault tolerance point of view).
> Also it does take a nontrivial performance hit with replicated levels.
> 
> Regards,
> Mridul
> On 17-May-2014 8:16 am, "Xiangrui Meng" <[email protected]> wrote:
> 
>> With 3x replication, we should be able to achieve fault tolerance.
>> This checkPointed RDD can be cleared if we have another in-memory
>> checkPointed RDD down the line. It can avoid hitting disk if we have
>> enough memory to use. We need to investigate more to find a good
>> solution. -Xiangrui
>> 
>> On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan <[email protected]>
>> wrote:
>>> Effectively this is persist without fault tolerance.
>>> Failure of any node means complete lack of fault tolerance.
>>> I would be very skeptical of truncating lineage if it is not reliable.
>>> On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)" <[email protected]> wrote:
>>> 
>>>> Xiangrui Meng created SPARK-1855:
>>>> ------------------------------------
>>>> 
>>>>             Summary: Provide memory-and-local-disk RDD checkpointing
>>>>                 Key: SPARK-1855
>>>>                 URL: https://issues.apache.org/jira/browse/SPARK-1855
>>>>             Project: Spark
>>>>          Issue Type: New Feature
>>>>          Components: MLlib, Spark Core
>>>>    Affects Versions: 1.0.0
>>>>            Reporter: Xiangrui Meng
>>>> 
>>>> 
>>>> Checkpointing is used to cut long lineage while maintaining fault
>>>> tolerance. The current implementation is HDFS-based. Using the BlockRDD
>> we
>>>> can create in-memory-and-local-disk (with replication) checkpoints that
>> are
>>>> not as reliable as HDFS-based solution but faster.
>>>> 
>>>> It can help applications that require many iterations.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> This message was sent by Atlassian JIRA
>>>> (v6.2#6252)
>>>> 
>>

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

Reply via email to