My understanding of the model is that you’re supposed to execute SparkFiles.get(…) on each worker node, not on the driver.
Since you already know where the files are on the driver, if you want to load these into an RDD with SparkContext.textFile, then this will distribute it out to the workers, there’s no need to use SparkContext.add to do this. If you have some functions that run on workers that expects local file resources, then you can use SparkContext.addFile to distribute the files into worker local storage, then you can execute SparkFiles.get separately on each worker to retrieve these local files (it will give different paths on each worker). > On Mar 8, 2016, at 5:31 AM, ashikvc <vcas...@gmail.com> wrote: > > I am trying to play a little bit with apache-spark cluster mode. > So my cluster consists of a driver in my machine and a worker and manager in > host machine(separate machine). > > I send a textfile using `sparkContext.addFile(filepath)` where the filepath > is the path of my text file in local machine for which I get the following > output: > > INFO Utils: Copying /home/files/data.txt to > /tmp/spark-b2e2bb22-487b-412b-831d-19d7aa96f275/userFiles-147c9552-1a77-427e-9b17-cb0845807860/data.txt > > INFO SparkContext: Added file /home/files/data.txt at > http://192.XX.XX.164:58143/files/data.txt with timestamp 1457432207649 > > But when I try to access the same file using `SparkFiles.get("data.txt")`, I > get the path to file in my driver instead of worker. > I am setting my file like this > > SparkConf conf = new > SparkConf().setAppName("spark-play").setMaster("spark://192.XX.XX.172:7077"); > conf.setJars(new String[]{"jars/SparkWorker.jar"}); > JavaSparkContext sparkContext = new JavaSparkContext(conf); > sparkContext.addFile("/home/files/data.txt"); > List<String> file > =sparkContext.textFile(SparkFiles.get("data.txt")).collect(); > I am getting FileNotFoundException here. > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/SparkFiles-get-returns-with-driver-path-Instead-of-Worker-Path-tp26428.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org