Hi I have a three node spark cluster. I restricted the resources per application by setting appropriate parameters and I could run two applications simultaneously. Now, I want to replicate an RDD and run two applications simultaneously. Can someone help how to go about doing this!!!
I replicated an RDD of size 1354MB over this cluster. The webUI shows that its replicated twice. But when I go to storage details, the two partitions, each of size ~677MB, are stored on the same node. All other nodes do not contain any partitions. Can someone tell me where am I going wrong? Thank you!! -karthik