I understand that RDDs don't need replication but I just wanted to know about the relation between the storage of RDDs and the HDFS
On Mon, Aug 4, 2014 at 3:32 PM, Stanley Shi <deming....@gmail.com> wrote: > RDD don't *need* replication; but it doesn't harm if the underlying > things has replication. > > > On Mon, Aug 4, 2014 at 5:51 PM, Deep Pradhan <pradhandeep1...@gmail.com> > wrote: > >> Hi, >> Spark can run on top of HDFS. >> While Spark talks about the RDDs which do not need replication because >> the partitions can be built with the help of lineage. But, HDFS inherently >> has replication. How do these two concepts go together? >> Thank You >> > >