I am running a model where the workers should not have the data stored in
them. They are only for execution purpose. The other cluster (its just a
single node) which I am receiving data from is just acting as a file server,
for which I could have used any other way like nfs or ftp. So I went with
hdfs so that it would not have to worry about partitioning of data and also
it does not effect my experiment. So I just had this question that does
spark worker read all the data before computation once its first task start,
and then distribute it among the workers memory or do they read it chunk by
chunk, by each worker and then store the end result in memory to send the
final result.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to