I understand that RDDs don't need replication but I just wanted to know
about the relation between the storage of RDDs and the HDFS
On Mon, Aug 4, 2014 at 3:32 PM, Stanley Shi wrote:
> RDD don't *need* replication; but it doesn't harm if the underlying
> things has replication.
>
>
> On Mon, Au
RDD don't *need* replication; but it doesn't harm if the underlying things
has replication.
On Mon, Aug 4, 2014 at 5:51 PM, Deep Pradhan
wrote:
> Hi,
> Spark can run on top of HDFS.
> While Spark talks about the RDDs which do not need replication because the
> partitions can be built with the h
Hi,
Spark can run on top of HDFS.
While Spark talks about the RDDs which do not need replication because the
partitions can be built with the help of lineage. But, HDFS inherently has
replication. How do these two concepts go together?
Thank You