Prashant Sharma
On Thu, Apr 24, 2014 at 12:15 PM, Carter <gyz...@hotmail.com> wrote: > Thanks Mayur. > > So without Hadoop and any other distributed file systems, by running: > val doc = sc.textFile("/home/scalatest.txt",5) > doc.count > we can only get parallelization within the computer where the file is > loaded, but not the parallelization within the computers in the cluster > (Spark can not automatically duplicate the file to the other computers in > the cluster), is this understanding correct? Thank you. > > Spark will not distribute that file for you on other systems, however if the file("/home/scalatest.txt") is present on the same path on all systems it will be processed on all nodes. We generally use hdfs which takes care of this distribution. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-about-how-hadoop-works-tp4638p4734.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >