Hi, I've exercised multiple options available for persist() including RDD replication. I have gone thru the classes that involve in caching/storing the RDDS at different levels. StorageLevel class plays a pivotal role by recording whether to use memory or disk or to replicate the RDD on multiple nodes. The class LocationIterator iterates over the preferred machines one by one for each partition that is replicated. I got a rough idea of CoalescedRDD. Please correct me if I am wrong.
But I am looking for the code that chooses the resources to replicate the RDDs. Can someone please tell me how replication takes place and how do we choose the resources for replication. I just want to know as to where should I look into to understand how the replication happens. Thank you so much!!! regards -Karthik