Re: Persist RDD doubt

sjayatheertha Thu, 23 Mar 2017 16:04:02 -0700


Spark’s cache is fault-tolerant – if any partition of an RDD is lost, it will 
automatically be recomputed using the transformations that originally created 
it.





> On Mar 23, 2017, at 4:11 AM, nayan sharma <nayansharm...@gmail.com> wrote:
> 
> In case of task failures,does spark clear the persisted RDD 
> (StorageLevel.MEMORY_ONLY_SER) and recompute them again when the task is 
> attempted to start from beginning. Or will the cached RDD be appended.
> 
> How does spark checks whether the RDD has been cached and skips the caching 
> step for a particular task.
> 
>> On 23-Mar-2017, at 3:36 PM, Artur R <ar...@gpnxgroup.com> wrote:
>> 
>> I am not pretty sure, but:
>>  - if RDD persisted in memory then on task fail executor JVM process fails 
>> too, so the memory is released
>>  - if RDD persisted on disk then on task fail Spark shutdown hook just wipes 
>> temp files
>> 
>>> On Thu, Mar 23, 2017 at 10:55 AM, Jörn Franke <jornfra...@gmail.com> wrote:
>>> What do you mean by clear ? What is the use case?
>>> 
>>>> On 23 Mar 2017, at 10:16, nayan sharma <nayansharm...@gmail.com> wrote:
>>>> 
>>>> Does Spark clears the persisted RDD in case if the task fails ?
>>>> 
>>>> Regards,
>>>> 
>>>> Nayan
>> 
>

Re: Persist RDD doubt

Reply via email to