Re: Are RDD's ever persisted to disk?

RK Aduri Tue, 23 Aug 2016 12:28:31 -0700

On an other note, if you have a streaming app, you checkpoint the RDDs so that 
they can be accessed in case of a failure. And yes, RDDs are persisted to DISK. 
You can access spark’s UI and see it listed under Storage tab.


If RDDs are persisted in memory, you avoid any disk I/Os so that any lookups 
will be cheap. RDDs are reconstructed based on a graph (DAG - available in 
Spark UI )

> On Aug 23, 2016, at 12:10 PM, <srikanth.je...@gmail.com> 
> <srikanth.je...@gmail.com> wrote:
> 
> RAM or Virtual memory is finite, so data size needs to be considered before 
> persist. Please see below documentation when to choose the persistency level.
>  
> http://spark.apache.org/docs/latest/programming-guide.html#which-storage-level-to-choose
>  
> <http://spark.apache.org/docs/latest/programming-guide.html#which-storage-level-to-choose>
>  
> Thanks,
> Sreekanth Jella
>  
> From: kant kodali <mailto:kanth...@gmail.com>
> Sent: Tuesday, August 23, 2016 2:42 PM
> To: srikanth.je...@gmail.com <mailto:srikanth.je...@gmail.com>
> Cc: user@spark.apache.org <mailto:user@spark.apache.org>
> Subject: Re: Are RDD's ever persisted to disk?
>  
> so when do we ever need to persist RDD on disk? given that we don't need to 
> worry about RAM(memory) as virtual memory will just push pages to the disk 
> when memory becomes scarce.
> 
>  
> 
>  
> 
> On Tue, Aug 23, 2016 11:23 AM, srikanth.je...@gmail.com 
> <mailto:srikanth.je...@gmail.com> wrote:
> Hi Kant Kodali,
> 
>  
> 
> Based on the input parameter to persist() method either it will be cached on 
> memory or persisted to disk. In case of failures Spark will reconstruct the 
> RDD on a different executor based on the DAG. That is how failures are 
> handled. Spark Core does not replicate the RDDs as they can be reconstructed 
> from the source (let’s say HDFS, Hive or S3 etc.) but not from memory (which 
> is lost already).
> 
>  
> 
> Thanks,
> Sreekanth Jella
> 
>  
> 
> From: kant kodali <mailto:kanth...@gmail.com>
> Sent: Tuesday, August 23, 2016 2:12 PM
> To: user@spark.apache.org <mailto:user@spark.apache.org>
> Subject: Are RDD's ever persisted to disk?
> 
>  
> 
> I am new to spark and I keep hearing that RDD's can be persisted to memory or 
> disk after each checkpoint. I wonder why RDD's are persisted in memory? In 
> case of node failure how would you access memory to reconstruct the RDD? 
> persisting to disk make sense because its like persisting to a Network file 
> system (in case of HDFS) where a each block will have multiple copies across 
> nodes so if a node goes down RDD's can still be reconstructed by the reading 
> the required block from other nodes and recomputing it but my biggest 
> question is Are RDD's ever persisted to disk? 
> 
> 
> 


-- 
Collective[i] dramatically improves sales and marketing performance using 
technology, applications and a revolutionary network designed to provide 
next generation analytics and decision-support directly to business users. 
Our goal is to maximize human potential and minimize mistakes. In most 
cases, the results are astounding. We cannot, however, stop emails from 
sometimes being sent to the wrong person. If you are not the intended 
recipient, please notify us by replying to this email's sender and deleting 
it (and any attachments) permanently from your system. If you are, please 
respect the confidentiality of this communication's contents.

Re: Are RDD's ever persisted to disk?

Reply via email to