Re: SparkFiles.get() returns with driver path Instead of Worker Path

Tristan Nixon Tue, 08 Mar 2016 08:54:12 -0800

My understanding of the model is that you’re supposed to execute 
SparkFiles.get(…) on each worker node, not on the driver.


Since you already know where the files are on the driver, if you want to load 
these into an RDD with SparkContext.textFile, then this will distribute it out 
to the workers, there’s no need to use SparkContext.add to do this.

If you have some functions that run on workers that expects local file 
resources, then you can use SparkContext.addFile to distribute the files into 
worker local storage, then you can execute SparkFiles.get separately on each 
worker to retrieve these local files (it will give different paths on each 
worker).

> On Mar 8, 2016, at 5:31 AM, ashikvc <vcas...@gmail.com> wrote:
> 
> I am trying to play a little bit with apache-spark cluster mode.
> So my cluster consists of a driver in my machine and a worker and manager in
> host machine(separate machine).
> 
> I send a textfile using `sparkContext.addFile(filepath)` where the filepath
> is the path of my text file in local machine for which I get the following
> output:
> 
>    INFO Utils: Copying /home/files/data.txt to
> /tmp/spark-b2e2bb22-487b-412b-831d-19d7aa96f275/userFiles-147c9552-1a77-427e-9b17-cb0845807860/data.txt
> 
>    INFO SparkContext: Added file /home/files/data.txt at
> http://192.XX.XX.164:58143/files/data.txt with timestamp 1457432207649
> 
> But when I try to access the same file using `SparkFiles.get("data.txt")`, I
> get the path to file in my driver instead of worker.
> I am setting my file like this
> 
>    SparkConf conf = new
> SparkConf().setAppName("spark-play").setMaster("spark://192.XX.XX.172:7077");
>        conf.setJars(new String[]{"jars/SparkWorker.jar"});
>        JavaSparkContext sparkContext = new JavaSparkContext(conf);
>        sparkContext.addFile("/home/files/data.txt");
>        List<String> file
> =sparkContext.textFile(SparkFiles.get("data.txt")).collect();
> I am getting FileNotFoundException here.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/SparkFiles-get-returns-with-driver-path-Instead-of-Worker-Path-tp26428.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: SparkFiles.get() returns with driver path Instead of Worker Path

Reply via email to