Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

Matei Zaharia Sun, 18 May 2014 15:55:29 -0700

JIRAs comments are mirrored to the [email protected] list, so people who 
want to get them by email can do so. In theory one should also be able to reply 
to one of those emails and have the message show up in JIRA, but I don’t think 
ours is configured that way. I’m not sure why it wouldn’t be the “ASF way” when 
the JIRA instance is hosted by the ASF and mirrored on ASF lists.


Matei

On May 18, 2014, at 11:28 AM, Andrew Ash <[email protected]> wrote:

> The nice thing about putting discussion on the Jira is that everything
> about the bug is in one place.  So people looking to understand the
> discussion a few years from now only have to look on the jira ticket rather
> than also search the mailing list archives and hope commenters all put the
> string "SPARK-1855" into the messages.
> 
> 
> On Sun, May 18, 2014 at 10:34 AM, Jacek Laskowski <[email protected]> wrote:
> 
>> Hi,
>> 
>> I'm curious if it's a common approach to have discussions in JIRA not here.
>> I don't think it's the ASF way.
>> 
>> Pozdrawiam,
>> Jacek Laskowski
>> http://blog.japila.pl
>> 17 maj 2014 23:55 "Matei Zaharia" <[email protected]> napisał(a):
>> 
>>> We do actually have replicated StorageLevels in Spark. You can use
>>> MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom
>>> replication factor.
>>> 
>>> BTW you guys should probably have this discussion on the JIRA rather than
>>> the dev list; I think the replies somehow ended up on the dev list.
>>> 
>>> Matei
>>> 
>>> On May 17, 2014, at 1:36 AM, Mridul Muralidharan <[email protected]>
>> wrote:
>>> 
>>>> We don't have 3x replication in spark :-)
>>>> And if we use replicated storagelevel, while decreasing odds of
>> failure,
>>> it
>>>> does not eliminate it (since we are not doing a great job with
>>> replication
>>>> anyway from fault tolerance point of view).
>>>> Also it does take a nontrivial performance hit with replicated levels.
>>>> 
>>>> Regards,
>>>> Mridul
>>>> On 17-May-2014 8:16 am, "Xiangrui Meng" <[email protected]> wrote:
>>>> 
>>>>> With 3x replication, we should be able to achieve fault tolerance.
>>>>> This checkPointed RDD can be cleared if we have another in-memory
>>>>> checkPointed RDD down the line. It can avoid hitting disk if we have
>>>>> enough memory to use. We need to investigate more to find a good
>>>>> solution. -Xiangrui
>>>>> 
>>>>> On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan <
>> [email protected]>
>>>>> wrote:
>>>>>> Effectively this is persist without fault tolerance.
>>>>>> Failure of any node means complete lack of fault tolerance.
>>>>>> I would be very skeptical of truncating lineage if it is not
>> reliable.
>>>>>> On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)" <[email protected]>
>>> wrote:
>>>>>> 
>>>>>>> Xiangrui Meng created SPARK-1855:
>>>>>>> ------------------------------------
>>>>>>> 
>>>>>>>            Summary: Provide memory-and-local-disk RDD checkpointing
>>>>>>>                Key: SPARK-1855
>>>>>>>                URL:
>> https://issues.apache.org/jira/browse/SPARK-1855
>>>>>>>            Project: Spark
>>>>>>>         Issue Type: New Feature
>>>>>>>         Components: MLlib, Spark Core
>>>>>>>   Affects Versions: 1.0.0
>>>>>>>           Reporter: Xiangrui Meng
>>>>>>> 
>>>>>>> 
>>>>>>> Checkpointing is used to cut long lineage while maintaining fault
>>>>>>> tolerance. The current implementation is HDFS-based. Using the
>>> BlockRDD
>>>>> we
>>>>>>> can create in-memory-and-local-disk (with replication) checkpoints
>>> that
>>>>> are
>>>>>>> not as reliable as HDFS-based solution but faster.
>>>>>>> 
>>>>>>> It can help applications that require many iterations.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> This message was sent by Atlassian JIRA
>>>>>>> (v6.2#6252)
>>>>>>> 
>>>>> 
>>> 
>>> 
>>

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

Reply via email to