If you use hadoopFile (or textFile) and have the same file on the same path in every node, I suspect it might just work.
On Tue, Dec 29, 2015 at 3:57 AM, Disha Shrivastava <dishu....@gmail.com> wrote: > Hi, > > Suppose I have a file locally on my master machine and the same file is > also present in the same path on all the worker machines , say > /home/user_name/Desktop. I wanted to know that when we partition the data > using sc.parallelize , Spark actually broadcasts parts of the RDD to all > the worker machines or it reads the corresponding segment locally from the > memory of the worker machine? > > How to I avoid movement of this data? Will it help if I store the file in > HDFS? > > Thanks and Regards, > Disha >