Prashant Sharma

On Thu, Apr 24, 2014 at 12:15 PM, Carter <gyz...@hotmail.com> wrote:

> Thanks Mayur.
>
> So without Hadoop and any other distributed file systems, by running:
>      val doc = sc.textFile("/home/scalatest.txt",5)
>      doc.count
> we can only get parallelization within the computer where the file is
> loaded, but not the parallelization within the computers in the cluster
> (Spark can not automatically duplicate the file to the other computers in
> the cluster), is this understanding correct? Thank you.
>
>
Spark will not distribute that file for you on other systems, however if
the file("/home/scalatest.txt") is present on the same path on all systems
it will be processed on all nodes. We generally use hdfs which takes care
of this distribution.


>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-about-how-hadoop-works-tp4638p4734.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to