Hi,

I've exercised multiple options available for persist() including  RDD
replication. I have gone thru the classes that involve in caching/storing
the RDDS at different levels. StorageLevel class plays a pivotal role by
recording whether to use memory or disk or to replicate the RDD on multiple
nodes.
The class LocationIterator iterates over the preferred machines one by one  for
each partition that is replicated. I got a rough idea of CoalescedRDD.
Please correct me if I am wrong.

But I am looking for the code that chooses the resources to replicate the
RDDs. Can someone please tell me how replication takes place and how do we
choose the resources for replication. I just want to know as to where
should I look into to understand how the replication happens.



Thank you so much!!!

regards

-Karthik

Reply via email to