Re: Spark on HDFS with replication

2014-08-04 Thread Deep Pradhan
I understand that RDDs don't need replication but I just wanted to know about the relation between the storage of RDDs and the HDFS On Mon, Aug 4, 2014 at 3:32 PM, Stanley Shi wrote: > RDD don't *need* replication; but it doesn't harm if the underlying > things has replication. > > > On Mon, Au

Re: Spark on HDFS with replication

2014-08-04 Thread Stanley Shi
RDD don't *need* replication; but it doesn't harm if the underlying things has replication. On Mon, Aug 4, 2014 at 5:51 PM, Deep Pradhan wrote: > Hi, > Spark can run on top of HDFS. > While Spark talks about the RDDs which do not need replication because the > partitions can be built with the h

Spark on HDFS with replication

2014-08-04 Thread Deep Pradhan
Hi, Spark can run on top of HDFS. While Spark talks about the RDDs which do not need replication because the partitions can be built with the help of lineage. But, HDFS inherently has replication. How do these two concepts go together? Thank You