Partitioning of RDD across worker machines

Disha Shrivastava Tue, 29 Dec 2015 03:58:59 -0800

Hi,

Suppose I have a file locally on my master machine and the same file is
also present in the same path on all the worker machines , say
/home/user_name/Desktop. I wanted to know that when we partition the data
using sc.parallelize , Spark actually broadcasts parts of the RDD to all
the worker machines or it reads the corresponding segment locally from the
memory of the worker machine?


How to I avoid movement of this data? Will it help if I store the file in
HDFS?

Thanks and Regards,
Disha

Partitioning of RDD across worker machines

Reply via email to